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Preface 



This volume contains the proceedings of the 21st international conference on the 
Foundations of Software Technology and Theoretical Computer Science 
(FSTTCS 2001), organized under the auspices of the Indian Association for 
Research in Computing Science (lARCS). 

This year’s conference attracted 73 submissions from 20 countries. Each sub- 
mission was reviewed by at least three independent referees. In a departure from 
previous conferences, the final selection of the papers making up the program was 
done through an electronic discussion spanning two weeks, without a physical 
meeting of the Program Committee (PC). 

Since the PC of FSTTCS is distributed across the globe, it is very difficult to 
fix a meeting whose time and venue is convenient for a substantial fraction of the 
PC. Given this, it was felt that an electronic discussion would enable all members 
to participate on a more equal footing in the final selection. All reviews, scores, 
and comments were posted on a secure website, with a mechanism for making 
updates and automatically sending notifications by email to relevant members 
of the PC. All PC members participated actively in the discussion. The general 
feedback on the arrangement was very positive, so we hope to continue this in 
future years. 

We had five invited speakers this year: Eric Allender, Sanjeev Arora, David 
Harel, Colin Stirling, and Uri Zwick. We thank them for having readily accepted 
our invitation to talk at the conference and for providing abstracts (and even 
full papers) for the proceedings. 

Two satellite workshops were organized in conjunction with FSTTCS this 
year. The conference was preceded by a two-day workshop (December 11-12) on 
Quantum Computing in Bangalore, organized by Harry Buhrman and Umesh 
Vazirani. Following the conference, on December 17-18, there was a two-day 
workshop on Reasoning about Large and Infinite State Systems in the nearby 
city of Chennai (Madras), organized by P.S. Thiagarajan. 

We thank all the reviewers and PC members, without whose dedicated effort 
the conference would not have been possible. We thank the Organizing Commit- 
tee for making the arrangements for the conference. As usual, Alfred Hofmann 
and his team at Springer- Verlag were very helpful in preparing the proceedings. 
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When Worlds Collide: 
Derandomization, Lower Bounds, 
and Kolmogorov Complexity 



Eric Allender 



Department of Computer Science 
Rutgers University 
Piscataway, NJ 08854-8019, USA 
allenderScs . rutgers . edu 
http : / /www. cs . rutgers . edu/~allender 



Abstract. This paper has the following goals: 

— To survey some of the recent developments in the field of derandom- 
ization. 

— To introduce a new notion of time-bounded Kolmogorov complex- 
ity (KT), and show that it provides a useful tool for understanding 
advances in derandomization, and for putting various results in con- 
text. 

— To illustrate the usefulness of KT, by answering a question that has 
been posed in the literature, and 

— To pose some promising directions for future research. 



1 Introduction 

If one were to have taken a poll among complexity theoreticians in 1985, asking 
if probabilistic algorithms are more powerful than deterministic ones (i.e., if P 
= BPP) I feel sure that a clear majority would have thought it more likely that 
BPP yf P. Similarly, during the late 1980’s when the “Arthur-Merlin” classes MA 
and AM were introduced [1] it is fair to say that most people in the community 
thought it most likely that MA and AM were not merely new names for NP. A 
poll taken today would, I believe, show a majority believing BPP = P and MA 
= AM = NP. 

Such swings of opinion are a sure sign of important progress in the field. 
(Consider the likely outcome of taking similar polls about whether NL=coNL, 
or whether IP=PSPACE, before and after the announcements that these prob- 
lems had been settled [2, 3, 4, 5].) In the case of BPP, MA, and AM, our new 
understanding of these classes can be credited to advances in the field of deran- 
domization - a field that is so large and active that I will not pretend to be able 
to survey it here. 

Rather, I will focus on a few recent and exciting developments in the field 
of derandomization, and I will show how these developments can be understood 
in the context of time-hounded Kolmogorov complexity. This has the additional 
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benefit of shedding light on the process of trying to prove lower bounds on the 
circuit complexity of various computational problems. In particular, I will high- 
light connections that exist between derandomization, Kolmogorov complexity, 
and the natural proofs framework of [6] . 

In this article, most proofs are sketched or omitted. A more detailed article 
describing this work and related results is in preparation with Detlef Ronnebur- 
ger. 



2 Ancient History 

I assume that the reader is familiar with the complexity classes P, NP, MA, AM, 
and BPP. As has become customary, E and NE refer to DTIME(2‘^(”)) and 
NTIME(2'^(”)), respectively, whereas EXP and NEXP refer to DTIME(2”°''*) 
and NTIME(2"°*' '), respectively. Computations performed by circuits of AND, 
OR, and NOT gates having depth 0(1) and polynomial size define the class 
AC°. When I refer to AC° I shall always refer to DLOGTIME-Uniform AC° 
(sometimes also called C/£;-uniform AC°) unless I explicitly mention non-uniform 
circuits. Occasionally, it is useful to refer to circuit size bounds other than poly- 
nomial. For instance, AC°(2*^") refers to the class of languages accepted by circuit 
families of AND, OR, and NOT gates having depth 0(1) and size bounded by 
2*^". P/poly is the class of languages for which there exist circuit families of poly- 
nomial size (with no restriction on circuit depth) . For background on complexity 
classes and uniformity, consult standard texts such as [7,8,9]. 

A set L is P-printable if there is a function that, on input n, produces in time 
a list of all elements of L having length at most n. P-printable sets were 
defined in [10] and were studied in [11] and elsewhere. 

It remains an open question if NEXP is contained in P /poly. Perhaps more 
surprisingly, it remains unknown if there exists some e < 1 such that NEXP 
is contained in AC°(2'^”). (The lower bounds for the Parity and Majority 
functions [12] provide examples of problems in P that lie outside of non-uniform 
AcO( 2" ), but in order to find examples of problems requiring larger circuits, 
currently we are forced to look outside of NEXP.) 

The exponential-time analog of the P=NP problem is the question of whether 
or not EXP=NEXP. The reader probably knows that P=NP if and only if ac- 
cepting computation paths of NP machines can be found in polynomial time. The 
situation at the level of exponential time is less clear, however. This motivates 
the following definition. 

A NEXP search problem is a problem where, for a given nondeterministic 
Turing machine running in time 2" for some c, one takes as input a string x 
and produces as output an accepting computation of M on input x (if such a 
string exists). (In the case where the running time is 2*^^”^ we call this an NE 
search problem.) It is clear that if every NEXP search problem is solvable in 
deterministic exponential time, then NEXP = EXP; however the converse is not 
known to hold [13,14,15]. 
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2.1 Levin’s Kt Complexity 

There can be no doubt that the question of how to find accepting computations of 
a nondeterministic machine is of central importance to computer science. Levin 
observed that there is an easy-to-compute ordering on S* giving an essentially- 
optimal search strategy [16]. This can be stated more formally by means of 
Levin’s formulation of time-hounded Kolmogorov complexity. 

Definition 1 (Levin). Let U he a universal Turing machine. Define Kt(x) to 
he min{|fi| + logt : U{d) = x in at most t steps}. 

Note that log jxj < Kt(x) < |a;| + 0(log |a;|). 

The elements of S* can be enumerated in order of increasing Kt(a:), and 
Levin observed that this ordering yields essentially the fastest way to search for 
accepting computations. The following definitions (from [17,18]) will be useful 
in stating certain correspondences. 

Definition 2. Let L C S* . Define Kti(n) to he min{Kt(a:) : x G L~"}. 

(In [17,18] the function Kt^ was called K^.) Here and elsewhere, is the 
set of all strings in L of length n. If there is no string in L of length n, then 
Kti(n) is undefined. When considering the rate of growth of a function Kti(n), 
the undefined values are not taken into consideration. Observe that, for every 
language L, logn < Kti(n) < n + O(logn). 

At this point, we can state some connections between the complexity of 
NEXT search problems, and time-bounded Kolmogorov complexity. The follow- 
ing statements are almost identical to observations in [17,18], and they derive 
from Levin’s original insight about Kt-complexity. 

Theorem 1. The following are equivalent: 

— All NEXT search problems are KXP-solvahle. 

— For all L in P, KtL(n) = log*^*'^^ n. 

Theorem 2. The following are equivalent: 

— For every e > 0, all NE search problems are solvable in time 2^ . 

— For every e > 0 and for every L in P, Kt^^n) = 0{n^). 

Since most complexity theoreticians conjecture that NEXP requires doubly 
exponential time, it follows that most complexity theoreticians conjecture that 
there are languages L in P such that Kti(n) grows nearly linearly. 

On the other hand, most complexity theoreticians also conjecture that pseu- 
dorandom generators exist. As the following paragraphs explain, it follows that 
most complexity theoreticians need to conjecture that Kti(n) grows slowly if L 
is in P (or even in P/poly) and has very many strings in it. 

For this paper, it will not be necessary to give precise definitions of pseu- 
dorandom generators. (The cited references provide details for the interested 
reader.) Recall only that a pseudorandom generator takes a short seed as input. 
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and produces a long pseudorandom output. In many applications we are inter- 
ested in the case where pseudorandom output of length n is produced in time 
polynomial in n. In this case, note that the output of a pseudorandom generator 
is a string of small Kt-complexity. 

A pseudorandom generator is said to be secure if, for any circuit C of size 
s, the probability that C accepts a random string of length n is within e of the 
probability that C accepts a pseudorandom string of length n. (The parameters 
s and e vary according to the security requirements.) Thus in particular note 
that if one has a secure pseudorandom generator and a circuit of polynomial 
size that accepts at least 2”/n inputs of length n (i.e., the circuit accepts a 
“dense” subset of if”), then it should also accept quite a few pseudorandom 
strings of length n. Hence it was observed in [17,18] that if secure pseudorandom 
generators exist, then for all dense languages L in P/poly, Kti(n) = 0{n^) 
or Kti(n) = O(logn) (depending on the security assumptions that were made 
about the pseudorandom generators). 

We have seen that there is reason to be interested in the rate of growth that 
is possible for functions Kti, when L is in P. One might also ask about Kti, 
when L is chosen from some other complexity class. The following observation 
(essentially identical to an observation credited to Russo in [11]) is of interest in 
this regard. 

Theorem 3. The following are equivalent: 

— For every L in P, Kti(n) = O(logn). 

— For every L in NP, Kti(n) = O(logn). 

— For every L in AC°, Kti(n) = O(logn). 

Similarly (by simply considering the (string,witness) pairs for a language in 
NP) it is easy to show: 

Theorem 4. The following are equivalent: 

— There is a language L in V and an e > 0 such that for all large n Kti(n) is 
defined and Ktrin) > n^. 

— There is a language L in NP and an e > 0 such that for all large n Kti(n) 
is defined and Kti(n) > n^. 

— There is a language L in AC° and an e > 0 such that for all large n Kti(n) 
is defined and Kti(n) > n^. 



3 A Brief Discussion of Derandomization 

One of the best examples of derandomization is provided by the lovely work of 
Impagliazzo and Wigderson: 

Theorem 5. [19] If there is a language A S E and a constant e > 0 such that, 
for all large n, there is no circuit of size 2*^” accepting then BPP = P. 
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Alternate proofs of this theorem of [19] can be found in [20]. 

The hypothesis to Theorem 5 seems very likely to be true. Indeed, it seems 
possible that there are problems in smaller complexity classes than E (including 
perhaps SAT itself) that require exponential-size circuits. It also seems likely that 
E contains problems that are significantly harder than this hypothesis assumes. 
For instance, it seems likely that E requires circuits of this size, even if the 
circuits are allowed to have “oracle gates” for SAT [21]. It is shown in [22] 
that this stronger hypothesis about the complexity of problems in E implies 
that AM = MA = NP. In [22] the same conclusion AM=NP is shown to follow 
from a weaker assumption: namely that the “hard” set A is in NEflcoNE. This 
hypothesis was subsequently weakened further by [23] . 

If one is willing to settle for a weaker conclusion than BPP=P, then it suffices 
to start with a much weaker assumption. 

Definition 3. Let t he a time hound. Define io-DTime(t(n)) to he the class of 
languages L for which there exists a language A G DTime(t(n)) with the property 
that, for infinitely many n, 

Theorem 6. [24] //EXP is not in P/poly, then 

BPP C P|io-DTime(2"'). 



This is an example of “limited derandomization”; while it does not yield a 
polynomial-time algorithm for BPP problems, it does improve over the triv- 
ial exponential-time upper bound (at least for infinitely many input lengths). It 
is difficult to imagine that languages in BPP could be easy to recognize on some 
input lengths and difficult to recognize on other input lengths. Nonetheless, it 
is still not known if “io-Dtime” can be replaced by “Dtime” in the preceding 
theorem. 

The conclusion of Theorem 6 implies that EXP is not contained in BPP ~ 
but it is not known if the ability to do limited derandomization implies that 
EXP is not in P/poly. That is, lower bounds are sufficient for limited derandom- 
ization, but lower bounds are not known to be necessary for derandomization. 
Stated another way, it would be interesting to know if the nonexistence of cir- 
cuit lower bounds for problems in EXP (i.e., EXP C P/poly) implies that no 
derandomization is possible (i.e., that BPP = EXP). 

In contrast, it is shown by Impagliazzo, Kabanets, and Wigderson (in part 
credited by them to van Melkebeek) that lower bounds for NEXP are necessary 
and sufficient for limited derandomization of MA: 

Theorem 7. [25] The following are equivalent: 

- NEXP g P/poly 

- NEXP g MA 

- MA C n,io(NTime(2”')/n"). 
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Central to the approach taken in [25] is the strategy of searching for solutions 
to NEXP-search problems by looking for “easy witnesses” first, where a string 
of length 2" is considered to be an “easy witness” if it is the truth table of a 
function with small circuit complexity. 

This leads us to the following two questions: 

Question 1: We already know that Kt complexity defines an optimal search 
strategy for NEXP search problems. How does this compare to the “easy witness” 
strategy of [25]? 

Question 2: Is the notion of “easy witnesses” also related to a variant of 
Kolmogorov complexity? 

4 A New Definition 

of Time-Bonnded Kolmogorov Complexity 

To try to find an answer to these questions, we are led to a new definition of 
time-bounded Kolmogorov complexity. The definition below is a slight variant 
on a definition introduced by Antunes, Fortnow, and van Melkebeek in [26]. 

Definition 4. Let U he a universal Turing machine. Define KT(a;) to be 
min{|fi| -|- t : for all i < |a:|, U{d, i) = Xi in at most t steps}. 

{A note on notation: The capital “T” in KT calls attention to the fact that the 
time bound has much more influence than in Kt complexity.) In the preceding 
definition, note that the string d is “almost” a description of of x in the usual 
sense; it is possible that U cannot determine the length of x from d. Of course, 
by adding O(logn) to the length of d, we could require that U{d,i) = T for all 
t > |a;| (thus obtaining a description of x in the usual sense). Logarithmic terms 
are insignificant for our applications, and thus we use the simpler definition. 

Observe that the KT and Kt measures can be generalized to obtain KT"^ 
and Kt"^ for oracles A, by giving U access to an oracle. It turns out that this is 
useful in clarifying the relationship between Kt and KT complexity. 

Theorem 8 (Ronneburger). [27] Let A he complete for E under linear-time 
reductions. Then there is a k € N such that Kt{x)/k < KT'^(a;) < kKt{x). Ln 
other words, Kt = 0{KT^). 

At first glance, the definition of KT complexity seems to have quite a different 
flavor than Levin’s Kt complexity, especially because Levin’s definition uses the 
log of the running time, and KT complexity uses running time. However, the 
preceding theorem shows that one can view Kt complexity as merely a variation 
of KT complexity; Kt complexity is KT complexity relative to E. 

Next, let us observe that KT complexity captures the essential character of 
the “easy witness” approach of [25]. 

Theorem 9. Let x he a string of length 2” (which we can view as the truth table 
of a function f on inputs of length n), and let A be any oracle. LfKT^{x) < m 
then there is an A-circuit of size 0{m^) computing f. Conversely, if there is an 
A-circuit of size m computing f, then KT"^(a;) = 0{m\ogm). 
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That is, KT(a;) is an approximation to the circuit size required to compute 
the function given by the truth-table x. 

It will be useful for us to define an analog to the function Kti,. 

Definition 5. Let L C S* . Define WT]^{n) to be min{KT(a;) : x G L^"}. 

The following two theorems have the same easy proofs as the corresponding 
observations for Ktz,. 

Theorem 10. The following are equivalent: 

— For every L in P, KTi(n) = log*^*-^^ n. 

— For every L in NP, KTi(n) = log*^^^^ n. 

— For every L in AC°, KTi(n) = log*^^^^ n. 

Theorem 11. The following are equivalent: 

— There is a language L in V and an e > Q such that for all large n KTi(n) 
is defined and KT/,(n) > n'^. 

— There is a language L in NP and on e > 0 such that for all large n KT ^(n) 
is defined and KTi(n) > n'^ . 

— There is a language L in AC° and an e > 0 such that for all large n KT i^{n) 
is defined and KTi(n) > n'^. 



4.1 How Do KT and Kt Compare? 

Proposition 1. |Kt(a;) < KT(a;) < 

It is reasonable to conjecture that one of the two inequalities above is essen- 
tially tight. This leads us to formulate the two hypotheses below. 

Hypothesis A: KT(cc) = 

Hypothesis B: There is some e > 0 such that for all large n there is a string 
X G A” such that KT(x) > 

Hypothesis A says that the first inequality is essentially tight, and Hypothesis 
B says that the second inequality is essentially tight. 

In the following section, we will see that each of these hypotheses has already 
been the object of a great deal of attention in the complexity theory community. 

5 The Hypotheses Are Familiar 

First, let us consider Hypothesis A, which states that there is not much difference 
between KT and Kt. This is equivalent to a number of other statements, all of 
which seem highly unlikely. 

Theorem 12. The following are equivalent. 

1. Hypothesis A 

2. EXP C P/poly. 
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3. For all F -printable sets L, KTl(7i) = n. 

4- There exists a dense set L G P/poly and some e > 0 such that for all large 
n, Kti(n) is defined and Kti(n) > n'^ . 

Proof. The equivalence {1. 2.) is straightforward. (Merely examine the KT 

complexity of prefices of the characteristic sequence of a complete language for 
E.) Essentially the same observations suffice to show (5. 1.) 

The proof of Theorem 6 (see [24]) actually establishes the implication {4- 
=> 2.) That is, the authors of [24] show that if EXP % P/poly, then the Nisan- 
Wigderson pseudorandom generator ([28]) can be used with seed length to 
simulate any BPP algorithm. In fact, their analysis shows that the generator 
must output a string accepted by any small circuit that accepts many strings, 
which is precisely the negation of condition 4- 

For the implication {2. 4^ we use an argument of [29]. Assume that 

condition 4- fails to hold. That is, for every dense L G P/poly and every e > 0, 
there are infinitely many x G L such that Kt(x) < |a:|'^. This yields a “hitting set 
generator;” i.e., a function Gg running in time 2*^1” i that, on input n, outputs 
a list of strings such that any dense language in P/poly must accept one of the 
strings in the list. The complement of the range of of G^ is dense, and is in E. If 
it were in P/poly it would contradict the existence of G^. 

The equivalence (5. 4-) is intriguing. It says that assuming that one 

class of sets has small Kolmogorov complexity is equivalent to another class 
of sets having large Kolmogorov complexity (using slightly different notions of 
time-bounded Kolmogorov complexity) . This tension between high and low com- 
plexity is responsible for many of the most exciting aspects of derandomization. 

Klivans and van Melkebeek show that the condition EXP ^ P/poly is suf- 
ficient to carry out limited derandomization of MA. If a stronger hypothesis is 
used, (i.e., one assumes that some P-printable set has large complexity 

instead of merely large KT complexity) then one obtains a derandomization of 
AM [22]. 

Let us now turn our attention from Hypothesis A to Hypothesis B. 
Theorem 13. The following are equivalent: 

1. Hypothesis B 

2. There exists a P-printable set L and an e > 0 such that for all large n, 
KT i^{n) is defined and KT/,(n) > n^. 

3. For all dense L in P/poly, Kti(n) = O(logn). 

4- There is a language A G E and a constant e > 0 such that, for all large n, 
there is no circuit of size 2*^" accepting . 

5. Efficient pseudorandom generators G exist, such that G : — >■ X”. 

6. Efficient hitting set generators exist. 

That is, in particular. Hypothesis B is equivalent to the hypothesis of Theorem 5. 
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Proof. Items 4- through 6. in the list above have been observed before to be 
equivalent [30,31,29]. The equivalence {1. <1=^ 2. <1=^ 4 -) is straightforward and 
similar to the proof of Theorem 12. It remains only to consider condition 3. 

Condition 3. is easily seen to be equivalent to the existence of a hitting 
set generator (i.e., a polynomial-time-computable function that, on input n, 
outputs a list S of strings of length n with the property that any circuit of size 
n accepting at least 2”/n elements of must accept an element of S). If we 
assume 3. is true, then the routine that lists all strings of length n having Kt 
complexity O(logn) is a hitting set generator. Conversely, if we assume that 
hitting set generators exist, note that all strings produced by the generator have 
Kt complexity O(logn). 

The preceding theorems gave equivalent ways to view the question of whether 
or not P-printable sets can have large KT-complexity. The following theorem 
shows that it is of interest to consider the KT-complexity of arbitrary sets in P. 

Theorem 14. The following are equivalent: 

1. For all L G P,KTi(n) = log'^^^^ n. 

2. All NEXP search problems are solvable in P/poly. 

3. NEXP C P/poly 

4. NEXP = MA 

5. MA g n<:io(NTime(2”')/n"). 

Proof. The equivalence {1. 2.) is straightforward. Equivalences (5. 4 - 

5.) and implication (5. 2.) were established in [25]. The remaining 

implication {2. 3.) is trivial. 

6 The Question of Density 

It is obvious that there are sets L in P/poly (and even in non-uniform AC°) 
with high Kti complexity (and hence with high KT r complexity), since for each 
length n one can select a string of length n with high Kt-complexity, and build 
a circuit that accepts exactly x„. However, as observed earlier, dense languages L 
in P /poly must have low Kt l complexity, or else secure pseudorandom generators 
do not exist. 

Theorems 3, 4, 10, and 11 show that the Kolmogorov complexity of sets in P 
and AC° are closely related. It is natural to wonder if the Kolmogorov complexity 
of dense sets in P/poly and non-uniform AC*^ are also related. No such relation- 
ship is known, although dense sets in AC*^ provide one of the few examples where 
we are able to say something nontrivial about Kt and KT complexity. 

It was essentially observed by [32] that the derandomization results of Nisan 
and Wigderson [28] can be used to show that, for dense L in AC°, Kti(n) = 
log*^^^^ n. Closer examination of the Nisan- Wigderson generator shows that an 
even stronger conclusion holds: 

Theorem 15. Let L be a dense language in AC°. Then KTi(n) = log*^^^^ n. 
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For even slightly larger classes (such as AC°[2]) no nontrivial bound on the 
KTi or Kti complexity is known. 

It is natural to wonder if the bound of log*^^^^ n in the preceding theorem 
can be improved to O(logn). The results of Agrawal [33] can be used to show 
that this question is equivalent to a question in circuit complexity: 

Theorem 16. The following are equivalent. 

— For all dense L in non-uniform AC°, Kti(n) = O(logn). 

- For some e > 0,E g AC°(2""). 



7 Natural Proofs, Lower Bounds, 
and Kolmogorov Complexity 

We have seen that there are good reasons to believe that dense sets in P/poly 
must have low Kt complexity, but we have not discussed any consequences of 
the existence of dense sets in P/poly with large KT complexity. 

It turns out that this is exactly the question that was raised by Razborov 
and Rudich when they developed their framework of “natural proofs” to explain 
why certain approaches to proving circuit lower bounds appear to be doomed to 
failure [6]. Let us recall some of the definitions of Razborov and Rudich. 

Definition 6. [6] A Combinatorial Property of Boolean functions is a subset of 
{S'^ : n G N}; that is, it is a set of truth tables. 

A combinatorial property C is useful against P/poly if any language L in 
P/poly has the property that {n : the truth table for lies in C} is finite. 

Razborov and Rudich also consider a “largeness” condition; we shall discuss that 
later. 

Note that, by Theorem 9, if KT(a;) is large, then there is some i < jccj such 
that xty is the truth table of a function requiring large circuits. From this simple 
observation, we can see that a language L where VfcV°“nKT ^(n) > log^ n is the 
same thing as a combinatorial property useful against P/poly. 

Impagliazzo, Kabanets, and Wigderson observed in [25] that if there is a com- 
binatorial property in NP that is useful against P/poly, then NEXP ^ P/poly. 
They then asked if the existence of a combinatorial property in P that is useful 
against P/poly implies that EXP ^ P/poly. However, as in the proof of Theo- 
rem 10, we can see that there is a combinatorial property in NP that is useful 
against P/poly if and only if such a property exists in AC°. 

The other crucial ingredient that a combinatorial property needs in order to 
be “natural” (in the framework of Razborov and Rudich) is that it needs to be 
dense. (That is, it must contain at least 2^ /N^ strings, for all N of the form 
2”. All theorems in this paper involving “dense” sets carry over if we change our 
density requirement from 2"/n to 2"/n‘^P^.) 

Razborov and Rudich showed that, under strong (but widely-believed) as- 
sumptions about the existence of secure pseudorandom generators, there is no 
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dense combinatorial property in P/poly that is useful against P/poly. Hence, 
under the hypothesis of [6], for all dense L G P/poly, KTl(u) = log*^*-^^ n. 

Note that there are very close connections between the notion of Natural 
Proofs and certain aspects of resource-bounded measure theory [34,35]. There 
is not space here to elaborate on these connections. However, this suggests that 
there may be interesting implications between time-bounded Kolmogorov com- 
plexity and resource-bounded measure that deserve to be explored. 

8 Random Strings 

The canonical dense sets with large Kolmogorov complexity are the sets con- 
taining all of the random strings. For our purposes, it will suffice to consider the 
following two sets: 

— Rkt is defined to be {a;|KT(x) > |a:|/2}. 

— i?Kt is defined to be {x|Kt(a;) > |a:|/2}. 

Similar (but not identical) notions of time-bounded randomness have been 
considered before [36,37,38,39]. 

i?KT is in coNP, and i?Kt is in E. It seems natural to conjecture that these 
upper bounds are essentially optimal. However, there are significant obstacles to 
showing that there are no smaller complexity classes containing these languages. 
Most significantly, there is reason to believe that these languages are not complete 
for coNP and E, respectively. For some notions of completeness, it is even possible 
to prove that this is the case. 

Theorem 17. i?Kt is not hard for E under polynomial-time many-one reduc- 
tions. 

Proof. We provide only a sketch. Let T be a subset of 0* that is in E but not 
in P. Let / be any polynomial-time function. Note that Kt(/(0")) = O(logn). 
Thus /(O") is not in i?xt unless /(O") is very short - in which case membership 
in i?Kt can be determined in polynomial time. 

Essentially the same proof also shows that i?xt is not complete under polynomial- 
time truth-table reductions. 

Buhrman and Mayordomo show that a related (but not identical) language 
is not hard for E even under polynomial-time Turing reductions (and in fact 
they obtain even stronger conclusions relating to resource-bounded measure) 
[40]. However, their argument does not seem to extend to i?xt- 

By Theorem 15, neither i?Kt nor Rkt is in (non-uniform) AC*^. Somewhat 
surprisingly, as the deadline for submission of this paper approaches I am unable 
to find or construct any other lower bounds on the complexity either of these 
problems (other than conditional lower bounds, such as the argument in [6] 
showing that these sets are not in P/poly if strong pseudorandom generators 
exist). 

Ko presents oracles relative to which a set closely related to Rkt is not 
complete for coNP. The recent proof by Agrawal, showing that all sets complete 
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for NP under AC° many-one reductions are AC°-isomorphic [41] can be used to 
show that i?Kt is not hard even for some very small subclasses of P. (In addition, 
it provides conditions where Rkt also fails to be hard) . 

Theorem 18 . i?Kt is not hard for under -many- one reductions. Also, 

if SAT has circuits of size 2" ^ \ then Rkt is not hard for TC° under AC°- 
many-one reductions. 

Proof. Again, we provide only a sketch. Agrawal shows in [41] that any set hard 
for TC° under AC° reductions is hard under length-increasing NC° reductions 
where the connections in the NC° circuits are computable in AC*^. Thus, in 
particular, the output of the reduction has small Kt complexity and cannot 
lie in i?Kt- For KT complexity, we require an additional assumption. Under 
the assumption that SAT has small circuits, there is a small circuit that takes 
i as input, and produces an NC° circuit for the t-th bit of the reduction as 
output. Now one can show that the output of the reduction must have small KT 
complexity, and hence foil the reduction. 

For instance, if P=NP, then i?Kt ^ P (by Theorem 13) and Rkt G Pj but 
Rkt is not hard for TC° under many-one reductions. 

Cai and Kabanets study a question of whether a set closely related to Rkt is 
in P or is NP-complete. For instance, if Rkt is complete under length-increasing 
polynomial-time reductions, then there is a P-printable set L such that Kti, 
grows very quickly, and hence one obtains all of the consequences listed in The- 
orem 13. 

9 Conclusions 

Kolmogorov complexity has been very useful in attacking a wide variety of prob- 
lems in computer science and mathematics. It has not found much application 
(yet) in the field of derandomization. (A notable exception is the work of Trevisan 
[42].) It is hoped that the definitions provided here will be useful in formulating 
new lines of attack on these problems. 

In particular, it seems quite likely to me that better bounds can be proved 
on the complexity of Rkt- It should be possible to prove sublinear bounds on 
the growth of Kt l for dense sets L in AC° [2] . I feel sure that there are new and 
interesting implications waiting to be discovered involving the KT and Kt l 
complexity of various classes of sets. Since these notions have tight connections 
to the theory of natural proofs (and hence to resource-bounded measure), I think 
it is possible that investigation of these questions will have application not only 
to the study of derandomization, but throughout complexity theory. 
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Geometric optimization problems arise in many disciplines and are often NP- 
hard. One example is the famous Traveling Salesman Problem (TSP): given n 
points in the plane (more generally, in find the shortest closed path that 
visits them all. 

Computing the optimum solution is NP-hard, which means that it has no 
efficient (i.e., polynomial-time) algorithm if P NP. Hence focus has shifted to 
designing approximation algorithms: algorithms that compute a solution whose 
cost is within a small multiplicative factor of the optimum. For instance, the 
classical Christofides heuristic can compute, for every TSP instance, a solution 
whose cost is within 50% of the cost of the optimum solution. 

Work in the past few years has led to dramatically better approximation 
algorithms for the TSP and many other geometric problems such as Steiner Tree, 
k-median, k-MST, facility location, minimum latency, etc. All these problems are 
now known to have approximation schemes, which are algorithms that, for every 
e > 0, can compute a solution of cost at most (1 -I- e) times the optimum. 

The talk will survey these approximations schemes, and explain the ideas 
behind their design. The talk will be self-contained. 

Bern and Eppstein [6] surveyed this field in their 1995 article. The author is 
preparing an up-to-date survey which should he available soon from his webpage at 
http://www.cs.princeton.edu. The article will also list the remaining major 
open problems in the field. 
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Abstract. We propose a novel approach to clustering, based on deter- 
ministic analysis of random walks on the weighted graph associated with 
the clustering problem. The method is centered around what we shall 
call separating operators, which are applied repeatedly to sharpen the 
distinction between the weights of inter-cluster edges (the so-called sep- 
arators), and those of intra-cluster edges. These operators can be used as 
a stand-alone for some problems, but become particularly powerful when 
embedded in a classical multi-scale framework and/or enhanced by other 
known techniques, such as agglomerative clustering. The resulting algo- 
rithms are simple, fast and general, and appear to have many useful 
applications. 



1 Introduction 

Clustering is a classical problem, applicable to a wide variety of areas. It calls 
for discovering natural groups in data sets, and identifying abstract structures 
that might reside there. Clustering methods have been used in computer vision 
[11,2], VLSI design [4], data mining [3], web page clustering , and gene expression 
analysis. 

Prior literature on the clustering problem is huge, see e.g., [7]. However, to 
a large extent the problem remains elusive, and there is still a dire need for a 
clustering method that is natural and robust, yet very efficient in dealing with 
large data sets. 

In this paper, we present a new set of clustering algorithms, based on deter- 
ministic exploration of random walks on the weighted graph associated with the 
data to be clustered. We use the similarity matrix of the data set, so no explicit 
representation of the coordinates of the data-points is needed. The heart of the 
method is in what we shall be calling separating operators, which are applied 
to the graph iteratively. Their effect is to ‘sharpen’ the distinction between the 
weights of inter-cluster edges (those that ought to separate clusters) and intra- 
cluster edges (those that ought to remain inside a single cluster), by decreasing 
the former and increasing the latter. The operators can be used on their own 
for some kinds of problems, but their power becomes more apparent when em- 
bedded in a classical multi-scale framework and when enhanced by other known 
techniques, such as agglomerative or hierarchical clustering. 

The resulting algorithms are simple, fast and general. As to the quality of the 
clustering, we exhibit encouraging results of applying these algorithms to several 
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recently published data sets. However, in order to be able to better assess its 
usefulness, we are in the process of experimenting in other areas of application 
too. 



2 Basic Notions 



We use standard graph-theoretic notions. Specifically, let G{V,E,w) be a 
weighted graph, which should be viewed as modeling a relation E over a set 
V of entities. Assume, without loss of generality, that the set of nodes V is 
{1, . . . , n}. The ru is a weighting function w : E — K.’*', that measures the sim- 
ilarity between pairs of items (a higher value means more similar). Let S Q V. 
The set of nodes that are connected to some node of S' by a path with at most k 
edges is denoted by V^{S). The degree of G, denoted by deg{G), is the maximal 
number of edges incident to some single node of G. The subgraph of G induced by 
S is denoted by G{S). The edge between i and j is denoted by (i,j)- Sometimes, 
when the context is clear, we will write simply {i,j) instead of {i,j) G E. 

A random walk is a natural stochastic process on graphs. Given a graph and 
a start node, we select a neighbor of the node at random, and ‘go there’, after 
which we continue the random walk from the newly chosen node. The probability 
of a transition from node i to node j, is 



P^j = 



di 



where di = w{i, k) is the weighted degree of node i. 

Given a weighted graph G{V, E, w), the associated transition matrix, denoted 
by is the nx n matrix in which, if i and j are connected, the (z, j)’th entry 
is simply pij . Hence, we have 




(b j) G E 
otherwise 



Now, denote by Pyisitii) G M” the vector whose j-th component is the proba- 
bility that a random walk originating at i will visit node j in its fc-th step. Thus, 
Pyisitii) is the z-th row in the matrix the fc’th power of M'^ . 

The stationary distribution of G is a vector p G K." such that p ■ = p. An 

important property of the stationary distribution is that if G is non-bipartite, 
then Pyisitii) tends to the stationary distribution as k goes to oo, regardless of 
the choice of z. 

The escape probability from a source node s to a target node t, denoted by 
Pescape{s , t) , is defined as the probability that a random walk originating at s 
will reach t before returning to s. This probability can be computed as follows. 
For every i G V, define a variable pi satisfying: 



Ps = 0, pt = 1, and 

Pi = '^ Pij ■ Pj for i ^ s, i ^ t 
{id) 
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The values of pi are calculated by solving these equations and then the 
desired escape probability is given by: 



-^escape ^ ^ \ 



(s,t) 



3 The Clustering Problem 

The common definition of the clustering problem is as follows. Partition n given 
data points into k clusters, such that points within a cluster are more similar 
to each other than ones taken from different clusters. The N data points are 
specified either in term of their coordinates in a d-dimensional space or by means 
of an n X n similarity matrix, whose elements Sij measure the similarity of data 
points i and j. 

Our algorithms use the similarity matrix only, and thus can deal with cases 
where pairwise similarity is the only information available about the data. Specif- 
ically, we address the problem of clustering the weighted graph G{V, E, w). Most 
often we prefer to model the data using sparse graphs, which contain only a 
small subset of the edges of the complete graph, those corresponding to higher 
similarity values. Working with sparse graphs has several advantages. First, it 
reduces the time and space complexity, and second, the “structure” of adequate 
sparse graphs expresses the arrangement of the data, thus helping the clustering 
process. 

A preferred quality of a clustering algorithm is its ability to determine the 
number k of natural clusters. In reality, however, most clustering algorithms 
require this number to be an input, which means that they may break up or 
combine natural clusters, or even create clusters when no natural ones exist in 
the data. 

The problem as described above is inherently ill-posed, since a set of points 
can be clustered naturally in many ways. For example. Figure 1(a) has three 
clusters, but one could argue that there are only two, since the two on the right 
hand side are close enough to be thought of as one. In Figure 1(b) one could 
argue for and against dividing the points in the top dense region into two highly 
connected natural clusters. A solution to such ambiguities is to use hierarchical 
clustering, which employs a parameter for controlling the desired resolution. 

Various cost functions, sometimes called objective functions, have been pro- 
posed in order to measure the quality of a given clustering. Viewing the clustering 
problem as an optimization problem of such an objective function formalizes the 
problem to some extent. However, we are not aware of any function that op- 
timally captures the notion of a ‘good’ cluster, since for any function one can 
exhibit cases for which it fails. Furthermore, not surprisingly, no polynomial-time 
algorithm for optimizing such cost functions is known. In fact, a main role of cost 

^ Notice that when multiplying each row i with di, the weighted degree of the respected 
node, the system is represented with a symmetric positive-definite matrix, which is 
easier to be solved 
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Fig. 1. Inherent ambiguity in clustering: How many clusters are there here? 



functions for clustering is to obtain some intuition about the desired properties 
of a good clustering, and to serve as an objective metric for distinguishing a 
good clustering from a bad one. 



3.1 Clustering Methods 

We now survey some clustering approaches. Instead of providing specific refer- 
ences to each method, we point the reader to the surveys in [7,8]. 

Clustering methods can be broadly classified into hierarchical and partitional 
approaches. Partitional clustering algorithms obtain a single partition of the data 
that optimizes a certain criterion. The most widely used criterion is minimizing 
the overall squared distance between each data point and the center of its related 
cluster. This tends to work well with isolated and compact clusters. The most 
common methods of this kind are the k-Means (that is suitable only for points in 
a metric space) and the k-Medoid algorithms. An advantage of these algorithms 
is their robustness to outliers (nodes that cannot be classified into a natural 
cluster). Another advantage is their quick running time. Their major drawback 
is a tendency to produce spherically shaped clusters of similar sizes, which often 
prevents the finding of natural clusters. For example, consider the graph in Figure 
2. A natural clustering decomposition of this graph is into two rectangular grids, 
the larger left-hand-side grid and the smaller right-hand-side grid. However, these 
methods will attach some nodes of the left-hand-side grid to the nodes of the 
right-hand-side grid, seeking to minimize the distance of each node to the center 
of its related cluster. 

Hierarchical algorithms create a sequence of partitions in which each partition 
is nested into next partition in the sequence. Agglomerative clustering is a well- 
known hierarchical clustering method that starts from the trivial partition of 
n points into n clusters of size 1 and continues by repeatedly merging pairs 
of clusters. At each step the two clusters that are most similar are merged, 
until the clustering is satisfactory. Different similarity measures between clusters 
result in different agglomerative algorithms. The most widely used variants are 
the Single-Link and the Complete- Link algorithms. In Single-Link clustering 
similarity between clusters is measured as the similarity between the most similar 
pair of elements, one from each of the clusters, while in Complete-Link clustering 
the similarity is measured using the least similar pair of elements. 
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The Complete-Link algorithm tends to break up a relatively large (though 
natural) cluster into two (unnatural) clusters, and will face similar difficulties 
to the partitional algorithms discussed above. The Single-Link algorithm has a 
different problem — the “chaining effect”: it can be easily fooled by outliers, 
merging two clusters that are connected by a narrow string of points. For ex- 
ample, when activated on the graph of Figure 2, it will fail, since the distance 
between the left-hand-side grid and the right-hand-side grid, is equal to the 
distance between any two adjacent subgraphs. 




Fig. 2. A natural clustering decomposition of this graph is to divide it into two clus- 
ters: the left-hand-side larger grid and the right-hand-side smaller grid. The two nodes 
connecting these grids are outliers. Our clustering method reveals this decomposition, 
unlike many traditional clustering methods that will not discover it. 



4 Cluster Analysis by Random Walks 

4.1 Cluster Quality 

Our work is motivated by the following predicate, with which we would like to 
capture a certain notion of the quality of a cluster: 

Definition 4.1 A cluster C is (d, a)-normal iff for every i,j € C for which 
dist{i,j) < d, the probability that a random walk originating at i will reach j 
before it visits some node outside C , is at least a. 

The role of a. is obvious. The reason for limiting the distance between i and 
j to some d is that for clusters with a large enough diameter it may be easier to 
escape out of the cluster than to travel between distant nodes inside the cluster. 
This demonstrates the intuition that in a natural cluster, we need not necessarily 
seek a tight connection between every two nodes, but only between ones that are 
close enough. For example, consider Figure 2. Random walks starting at nodes 
in the right-hand-side of the of the large cluster, will probably visit close nodes 
of the other cluster, before visiting distant nodes of their own cluster. 

In fact, the normality predicate can be seen to define the intuitive notion of 
discontinuities in the data. Such discontinuities indicate the boundaries of the 
clusters, and are created by sharp local changes in the data. 

The normality predicate may label as good the clusters in different cluster- 
ing decompositions of the same graph. This may be important in some cases. 
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like when we want to identify a hierarchical clustering decomposition. A disad- 
vantage of this predicate is that when a cluster is not well separated from its 
neighborhood, the normality predicate may fail to declare the cluster as natural, 
even though its global structure might be very natural. For example consider 
Figure 3. A natural clustering decomposition of this graph is to separate the 
left-hand-side and the right-hand-side grids. However, the normality predicate 
will not necessarily label these two clusters normal, since there is a relatively 
wide portion connecting them. 




Fig. 3. Giving the normality predicate a hard time 



Having said all this, we note that we do not have an efficient method for find- 
ing a clustering decomposition that adheres exactly to the normality predicate. 
However, the algorithms we have developed were conceived of to adhere to its 
spirit. 



4.2 Separators and Separating Operators 

Our approach to identifying natural clusters in a graph is to find ways to compute 
an ‘intimacy relation’ between the nodes incident to each of the graph’s edges. 
In other words, we want to be able to decide for each edge if it should cross the 
boundaries of two clusters (when a discontinuity is revealed), or, rather, if the 
relationship between its two incident nodes is sufficiently intimate for them to 
be contained in a common cluster. 

Definition 4.2 Let the graph G{V,E) be clustered by C = {Ci, . . . ,Ck) ■ An 
edge {u, v) € E is called a separating edge for C, or a separator for short, if 
uGC^,v G Cj for i ^ j. 

Any set of edges E C E gives rise to an induced clustering Cp, obtained by 
simply taking the clusters to be the connected components of the graph G{V,E— 
E). The set F will then contain precisely the separating edges of Gp- Another 
way of putting this is that if we can indeed decide which are the separators of a 
natural clustering of G, we are done, since we will simply take the clustering to 
be Cf for the discovered set F of separators. 

We have decided to concentrate on discovering a set of separating edges, 
since, in the context of the normality predicate, the decision as to whether an 
edge should be separating involves only relatively local considerations. Globally 
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speaking, there might not be much difference between two neighboring nodes, 
and the reasons for placing two neighbors in different clusters will most often be 
local. Our philosophy, therefore, is that after identifying the separators by local 
considerations, we will deduce the global structure of the clustering decomposi- 
tion by solving an easy global problem of finding connected components. 

The strategy we propose for identifying separators is to use an iterative pro- 
cess of separation. Separation reweights edges by local considerations in such 
a way that the weight of an edge connecting ‘intimately related’ nodes is in- 
creased, and for others it is decreased. This is a kind of sharpening pass, in 
which the edges are reweighted to sharpen the distinction between (eventual) 
separating and non-separating edges. When the separating operation is iterated 
several times, a sort of ‘zero-one’ phenomenon emerges, whereby the weight of 
an edge that should be a separator notably diminishes. 

We now offer two methods for performing the edge separation, both based 
on deterministic analysis of random walks. 

NS: Separation by neighborhood similarity. A helpful property of the 
vector Pyisiti^) is that it provides the level of nearness or intimacy between the 
node i and every other node, based on the structure of the graph. Actually, 
Pyisiti'i) generalizes the concept of weighted neighborhoods, since P^isiti'i) is 
exactly the weighted neighborhood of i. Also, P^suii) does not depend on i 
and is equal to the stationary distribution of the graph (when it exists). Hence, 
the value of P^ignii) is not very interesting for overly large values of k. We will 
actually be using the term P^^n{-), which is defined to be ^visiti"^)- 

Now, in order to estimate the closeness of two nodes v and u, we fix some small 
k (e.g., k = 3) and compare P^^sitW) P^sniu). The smaller the difference 
the greater the intimacy between u and v. The reason we use P^g^^ here and not 
Pyisn is that for a bipartite subgraph the values of Py^gn can be very different, 
since the two random walks originating from u and v cannot visit the same node 
at the same time. However, if we are willing to sum some steps of the two walks, 
we may find that they visit roughly the same nodes. 

We now define the separating operator itself: 

Definition 4.3 Let G{V,E,w) he a weighted graph and k be some small con- 
stant. The separation of G by neighborhood similarity, denoted by NS{G), is 
defined to be: 

NS{G) Gs{V,E,Ws), 

where V(u, u) G E, Wg{u, v) = sim^ {P^g^fiv) , P^g^fiu)) 

sim^{x,y) is some similarity measure of the vectors x and y, whose value in- 
creases as X and y are more similar. A suitable choice is: 

f^{x,y) ="exp(2/c- \\x-y\\L,) - 1 

The norm Li is defined in the standard way: For a,b G M”, ||a — 6 ||lj = 
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Another suitable choice is the cosine, or the correlation, of x and y that is 
defined as: 



cos{x,y) 



(x,y) 



where (•,•) denotes inner-product. 

The key component in computing NS{G) is the calculation of P^sniv) and 
Pyigit{u)- If the graph G is of bounded degree, P^ign{u) can be computed in time 
and space 0{deg{G)^), which is independent of the size of G and can be treated 
as a constant. Hence, for bounded degree graphs NS{G) can be computed in 
space 0(1) and time 0{\E\), which in this case is just 0(|H|) = 0(n). 



CE: Separation by circular escape. An alternative method for capturing the 
extent of intimacy between nodes u and v, is by the probability that a random 
walk that starts at v visits u exactly once before returning to v for the first time. 
(This notion is symmetric, since the event obtained by exchanging the roles of v 
and u has the same probability.) If v and u are in different natural clusters, the 
probability of such an event will be low, since a random walk that visits v will 
likely return to v before reaching u (and the same with u and v exchanged). 

The probability of this event is given by: 

Pescapeiy ' Pescapei'^j 

Seeking efficient computation, and on the reasonable assumption that data 
relevant to the intimacy of v and u lies in a relatively small neighborhood around 
V and u, we can constrain our attention to a limited neighborhood, by the fol- 
lowing: 

Definition 4.4 Let G{V,E,w) be a graph, and let k be some constant. Denote 
by Pescape{v,u) the probability Pescape{r , u) , but computed using random walks 
on the subgraph G(U*({u,w})) instead of on the original graph G. The circular 
escape probability of v and u is defined to be: 




We can now define separation by circular escape: 

Definition 4.5 Let G{V,E,w) be a weighted graph, and let k be some small 
constant. The separation of G by circular escape, denoted by CE{G), is defined 
to be: 

GE{G) ‘"^Gg{V,E,Wg) 

where V(v,u) € E, Wg{u,v) = CE^{v,u) 

For graphs with bounded degree, the size of G{V^{v, u)) is independent of the 
size of G, so that GE^{v,u) can be computed essentially in constant time and 
space. Hence, as with NS{G), the separating operator GE{G) can be computed 
in time 0{\E\) = 0(n) and space 0(1). 
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4.3 Clustering by Separation 



The idea of separating operators is to uncover and bring to the surface a closeness 
between nodes that exists implicitly in the structure of the graph. Separating 
operators increase the weights of intra-cluster edges and decrease those of inter- 
cluster ones. Iterating the separating operators sharpens the distinction further. 
After a small number of iterations we expect the difference between the weights 
of the two kinds of edges to differ sufficiently to be readily apparent, because 
the weights of separators are expected to diminish significantly. 

The partition of the edges into separators and non-separators is based on 
a threshold value, such that all the edges whose weight is below this value are 
declared as separators. Without loss of generality, we may restrict ourselves to 
the 0{\E\) edge weights as candidates for being thresholds. The actual threshold 
value (or several, if a hierarchy of decompositions is called for), is found by some 
statistical test, e.g., inspecting the edge- weight frequency histogram, where the 
frequency of the separators’ weights is usually smaller, since most of the edges 
are inside the clusters, and have higher weights than those of the separators. 

We demonstrate this method by several examples. Consider Figure 4, which 
contains an almost uniformly weighted graph, taken from [12]. We experimented 
with both separating operators, each one with a four-fold iteration. The NS 
operator was used with fc = 3 and sim^{x,y) f^{x,y) and the CE operator 
with k = 2, other choices work very similarly. The results of both runs appear 
along the edges in the figure (with those of CE appearing, multiplied by 100, in 
parentheses). As can be seen, the separation iterations cause the weights of edges 
(3, 18), (7, 8), (6, 10), (1,4), and (8, 18) to become significantly smaller than those 
of the other edges; in fact, they tend to zero in a clear way. We conclude that 
these edges are separators, thus obtaining the natural clustering of the graph by 
removing them and taking each connected component of the resulting graph to 
be a cluster, as indicated in the lower right hand drawing of the figure. 

Notice that the first activation of the separating operator already shows dif- 
ferences in the intimacy that later lead to the clustering, but the results are 
not quite sharp enough to make a clear identification of separators. Take edge 
(6, 10), for example. We intentionally initialized it to be of weight 1.5 — higher 
than the other edges — and after the first separation its weight is still too high 
to be labeled a separator. It is still higher than that of the non-separating edge 
(10,13). However, the next few iterations of the separating operator cause its 
weight to decrease rapidly, sharpening the distinction, and its being a separator 
becomes obvious. 

The success in separating nodes 6 and 10 is particularly interesting, and 
would probably not be possible by many clustering methods. This demonstrates 
how our separation operators integrate structural properties of the graph, and 
succeed in separating these nodes despite the fact that the edge joining them 
has the highest similarity value in the graph. 

Figure 5 shows the algorithms applied to a tree, using three-fold separation. 
The results clearly establish edges (0,6) and (6,11) as separators. Notice that 
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separation iteration 4 



the clustering (separators dashed) 



Fig. 4. Clustering using four-fold application of separation operators, which sharpen 
the edge weight distinction {NS values are on top and CE values, multiplied by 100, 
are in parentheses); example taken from [11] 












28 



David Harel and Yehuda Koren 



the clustering methods that rely on edge-connectivity will fail here, since the 
edge-connectivity between every two nodes of the tree is one. 





Fig. 5. Clustering a tree using three- fold application of separation (separators are 
dashed in iteration 3) 



Figure 6 shows seven copies of the complete graph Kq arranged cyclically. 
Each node is linked by ‘internal’ edges to the other five nodes in its own complete 
subgraph, and by ‘external’ edges to two nodes of the neighboring complete 
subgraphs. All edges have uniform weight. In the table, one can see how iterations 
of the separating operators diminish the weights of the ‘external’ edges, which 
are clearly found to be separators, decomposing the graph into seven clusters of 
complete subgraphs. 

When applying the separating operators to the graph of Figure 2, the edges 
of the lowest sharpened weight are those outside the two grids, resulting in the 
decomposition of the graph into three clusters, as shown in Figure 7. 

The graph in Figure 8(a) demonstrates the application of our method to a 
weighted graph. The weight of edges of this graph have been set up to decrease 
exponentially with their length. After a three-fold iteration of the CE separat- 
ing operator with fc = 3, and declaring the edges with weight below 0.097 as 
separators, the graph is decomposed into two clusters depicted in Figure 8(b). 
Slight changes to the value of A:, or applying the NS separating operator, produce 
similar results, where several nodes on the chains connecting the upper and the 
lower clusters become independent clusters of outliers. 
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1 


2 


3 


4 


5 


6 


CE 


30.56 / 16.21 


33.16 / 9.51 


34.51 / 4.74 


35.31 / 1.65 


35.77 / 0.26 


35.96 / 0 


NS 


191.38 / 12.08 


279.17 / 0.33 


287.14 / 0.01 


287.3 /O 


287.3 /O 


287.3 / 0 



Fig. 6. Clustering a cycle of complete graphs. Edges are of two kinds: internal edges 
that link two nodes of the same complete subgraph and external edges linking nodes of 
different subgraphs. The table shows the sharpened weights of internal/external edges 
after each of six iterations of separation. 




Fig. 7. Clustering the graph of Figure 2. The 3 clusters are denoted by different colors. 




Fig. 8. (a) A weighted graph (edge weights are decaying exponentially with their 
length); (b) decomposition of the graph into two clusters. 
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The main virtues of clustering by separation are: 

1. Applying a separation operator to an edge in a bounded-degree graph takes 
constant time and space, resulting in very good time complexity for large 
bounded-degree graphs. 

2. Edges are weighted based on the relevant graph structure, thus overcoming 
phenomena like random noise and outliers, which are not reflected directly 
in the structure. 

3. Iterating the separating operators causes information from distant parts of 
the graph to ‘flow in’, reaching the areas where separating decisions are to 
be made. 

Notice that the differences between consecutive iterations of separation di- 
minish as the process continues, and they appear to tend to a flxpoint. This 
behavior requires further investigation. 



4.4 Clustering Spatial Point-Sets 

We now illustrate the ability of our method to cluster “correctly” 2D sets of 
points, in a number of typical cases, some of which have been shown to be 
problematic for agglomerative methods [9]. (More extensive examples are given 
in Subsection 5.1.) For a short version of this paper that deals with clustering 
spatial data, see [5]. 

We have used 10-mutual neighborhood graphs for modeling the points. The 
fc-mutual neighborhood graph contains all edges (a, h) for which a is one of the 
k nearest neighbors of 6, and b is one of the k nearest neighbors of a. Regarding 
edge weights, we adopt a commonly used approach: the weight of the edge (a, b) 
is exp(— ), where d{a,b) is the Euclidean distance between a and h, and 
ave is the average Euclidean distance between two adjacent points in the graph. 

The results are achieved using 3 iterations of either CE or NS, with fc = 3. 
For NS, we took the function •) to be /(•, •). In general, other choices work 

equally well. 

The partition of the edges into separators and non-separators is based on 
a threshold value, such that all the edges whose weight is below this value are 
declared as separators. Without loss of generality, we may restrict ourselves to 
the 0{n) edge weights as candidates for being thresholds. The actual threshold 
value (or several, if a hierarchy of decompositions is called for), is found by some 
statistical test, e.g., inspecting the edge-weight frequency histogram, where the 
frequency of the separators’ weights is usually smaller, since most of the edges 
are inside the clusters, and have higher weights than those of the separators. 

Figure 9 shows the clustering decomposition of three data sets using our 
algorithm. 

The data set DSl shows the inherent capability of our algorithms to clus- 
ter at different resolutions at once, i.e., to detect several groups with different 
intra-group densities. This ability is beyond the capabilities of many clustering 
algorithms that can show the denser clusters only after breaking up the sparser 
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clusters. Data set DS2 demonstrates the ability of our algorithm to separate 
the two left hand side clusters, despite the fact that the distance between these 
clusters is smaller than the distance between points inside the right hand side 
cluster. 

The data set DS3 exhibits the capability of our algorithm to take into account 
the structural properties of the data set, which is the only clue for separating 
these evenly spaced points. 




Fig. 9. Clustering of several data sets. Different clusters are indicated by different 
colors. 



When there is a hierarchy of suitable decompositions, our method can reveal 
it by using a different threshold for each level of the hierarchy. For example, 
consider the two data sets of Figure 1. For each of these we have used two 
different thresholds, to achieve two decompositions. The results are given in 
Figure 10. 

5 Integration with Agglomerative Clustering 

The separation operators can be used as a preprocessing stage before activating 
agglomerative clustering on the graph. (Without loss of generality, we think of 
agglomerative algorithms as working on graphs.) Such preprocessing sharpens 
the edge weights, adding structural knowledge to them, and greatly enhances the 
agglomerative algorithms, as it can effectively prevent bad local merging that 
works against the graph structure. 

Implementation of the agglomerative algorithm can be done using a dynamic 
graph structure. At each step we take the edge of the highest weight, merge 
(“contract”) its two endpoints, and update all the adjacent edges. When con- 
tracting nodes u and v having a common neighbor t, the way we determine the 




32 



David Harel and Yehuda Koren 



jyy.'y. 

** * **•••• 


0°Mo°0°o O°0tog0O° 
• •• ••• x'Aooo ooSyFriOO 

°°Oo°o o^ogo%^oo 

Threshold ~ 0 


oyojOoooOo 

Oo°0 ** 

0 . 01 ( 0 . 06 ) ^ Threshold ^ 27 . 17 ( 19 . 16 ) 


• • 


Threshold ~ 0 


8 . 97 ( 18 . 02 ) < Threshold < 9 . 52 ( 18 . 39 ) 



Fig. 10. Clustering at mnltiple resolutions nsing different thresholds. When values of 
CE are different from values of NS, the CE values are given in parentheses. CE values 
are multiplied by 100. 



weight of the edge between t and the contracted node uniquely distinguishes 
between different variants of the agglomerative procedure. For example, when 
using Single-Link, we take this weight as ma,x{w{v,t),w{u,t)}, while when us- 
ing total similarity we fix the weight as w{v,t) + w{u,t). For a bounded degree 
graph, which is our case, each such step can be carried out in time O(logn), 
using a binary heap. 

It is interesting that the clustering method we have described in the previous 
section is in fact equivalent to a Single-Link algorithm preceded by a separation 
operation. Hence we can view the integration of the separation operation with 
the agglomerative algorithm as a generalization of the method we have discussed 
in the previous section, which enables us to use any variant of the agglomerative 
algorithm. 

We have found particularly effective the normalized total similarity variant, 
in which we measure the similarity between two clusters as the total sum of the 
weights of the original edges connecting these clusters. We would like to eliminate 
the tendency of such a procedure to contract pairs of nodes representing large 
clusters whose connectivity is high due to their sizes. Accordingly, we normalize 
the weights by dividing them by some power of the sizes of the relevant clusters. 
More precisely, we measure the similarity of two clusters Ci and C2 by: 



w{Ci,C2) 

where w{Ci,C2) is the sum of original edge weights between Ci and C2, and d 
is the dimension of the space in which the points lie. We took ^\C\ \ and \/\^\ 
as an approximation of the size of the boundaries of the clusters C\ and C2, 
respectively. 

The overall time complexity of the algorithm is O(nlogn), which includes 
the time needed for constructing the graph and the time needed for performing 
n contractions using a binary heap. This equals the time complexity of the 
method described in the previous section (because of the graph construction 
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stage). However, the space complexity is now worse. We need 6>(n) memory for 
efficiently handling the binary heap. 

Selecting Significant Decompositions 

An agglomerative clustering algorithm provides us with a dendrogram, which is 
a pyramid of nested clustering decompositions. It does not directly addresses the 
question of which are the meaningful decompositions inside the dendrogram. 

Each level in the dendrogram is constructed from the level below, by merging 
two clusters. We associate with each level a grade that measures the importance 
of that level. Inspired by the work of [2], a rather effective way of measuring the 
importance of a level is by evaluating how sharp is the change that this level 
introduces into the clustering decomposition. Since changes that are involved 
with small clusters do not have a large impact, we define the prominency rank 
of a level in the dendrogram, in which the clusters Ci and Cj of the level below 
were merged, as: 

\c^\■\c,\ 

We demonstrate the effectiveness of this measure in the next section. 

5.1 Examples 

In this section we show the results of running our algorithm on several data sets 
from the literature. For all the results we have used total similarity agglomerative 
clustering, preceded by 2 iterations of the NS separation operator with k = 3 
and similarity function defined as cos(-, •). Using the CE operator, changing the 
value of A:, or increasing the number of iterations, do not have a significant effect 
on the results. Using the method described in Section 4 may change the results 
in few cases. 

We implemented the algorithm in C++, running on a Pentium III 800MHz 
processor. The code for constructing the Delaunay triangulation is of Trian- 
gle, which is available from URL: http://www.cs.cmu.edu/~quake/triangle.html. 
The reader is encouraged to see [6], in order to view the figures of this section 
in color. 

Figure 11 shows the results of the algorithm on data sets taken from [9]. 
These data sets contain clusters of different shapes, sizes and densities and also 
random noise. A nice property of our algorithm is that random noise gets to 
stay inside small clusters. After clustering the data, the algorithm treats all the 
relatively small clusters, whose sizes are below half of the average cluster size, 
as noise, and simply omits them showing only the larger clusters. 

Figure 12 shows the result of the algorithm applied to a data set from [1]. 
We show two levels in the hierarchy, representing two possible decompositions. 
We are particularly happy with the algorithm’s ability to break the cross shaped 
cluster into 4 highly connected clusters, as shown in Figure 12(c). 

In Figure 13, which was produced by adding points to a data set given in [1], 
we show the noteworthy capability of our algorithm to identify clusters of dif- 
ferent densities at the same level of the hierarchy. Notice that the intra-distance 
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between the points inside the right hand side cluster, is larger than the inter- 
distance between several other clusters. 

The data set in Figure 14, which in a way is the most difficult one we have 
included, is taken from [2]. We have modeled the data exactly the same way 
described in [2], by putting an edge between every two points whose distance is 
below some threshold. Using this model, [2] shows the inability of two spectral 
methods and of the Single-Link algorithm to cluster this data set correctly. 

Throughout all the examples given in this section, we have used the promi- 
nency rank introduced in Section 5 to reveal the most meaningful levels in the 
dendrogram. Figure 15 demonstrates its capability with respect to the data set 
DS4 (shown in Figure 11). We have chosen the five levels with the highest promi- 
nency ranks, and for each level we show the level that precedes it. It can be seen 
that these five levels are exactly the five places where the six large natural clus- 
ters are merged. In this figure we have chosen not to hide the noise, so the reader 
can see the results of the algorithm before hiding the noise. 

Table 1 gives the actual running times of the algorithm on the data sets given 
here. We should mention that our code is not optimized, and the running time 
can certainly be improved. 



Table 1. Running time (in seconds; non-optimized) of the various components of the 
clustering algorithm 



Data Set 


Size 


Graph construction 


Separation 


Agglomeration 


Overall 


Ratio 

Sec 


DS4 


8000 


0.4 


0.88 


0.19 


1.47 


5434 


DS5 


8000 


0.41 


0.83 


0.19 


1.43 


5587 


DS6 


10000 


0.5 


1.12 


0.26 


1.88 


5311 


DS7 


8000 


0.4 


0.89 


0.2 


1.49 


5358 


DS8 


8000 


0.39 


0.93 


0.2 


1.52 


5256 


DS9 


8000 


0.33 


0.66 


0.21 


1.2 


6656 


DSIO 


3374 


0.14 


0.26 


0.07 


0.47 


7178 



6 Multi-scale Clustering 

In this section we embed our separation operators inside a classical multi-scale 
scheme. The resulting algorithm subsumes the agglomerative variant presented 
in Section 5. 

A multi-scale treatment of a graph-related problem handles the graph in a 
global manner, by constructing a coarse abstraction thereof. The abstraction 
is a new graph that contains considerably fewer nodes than the original, while 
preserving some of its crucial properties. Dealing with a global property for the 
coarse graph may be easier, since it contains much less information, and hopefully 
still has the desired property. A multi-scale representation of the graph consist 
of various coarse abstractions that allow us to view the graph on different scales, 
that differ in the level of abstraction they represent. For example, see [10,11]. 
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Fig. 11. Data sets taken from [9] (see [6] for clearer color versions of this figure and of 
Figs. 12-15). 



6.1 The General Scheme 

In our context, we find that the multi-scale technique is often called for in order 
to identify clusters whose naturalness stems from the graph’s global structure, 
and which would be very difficult to identify using only local considerations. 
Such clusters are not well separated from their surroundings. For example, there 
might be wide ‘channels of leaked information’ between such a cluster and others, 
disrupting separation. If we were able to construct a coarse graph in which a wide 
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(a) DS9: 8000 points 


(b) 


(c) 



Fig. 12. Two different clusterings of a data set taken from [1] 




DSIO: 3374 points 



Fig. 13. A data set with clusters of different densities 




Fig. 14. A 2012 points data set taken from [2] 



channel of connection is replaced by a single separating edge, we would be able 
to overcome the difficulty. 

For example, consider the graph in Figure 3. As mentioned earlier, the natu- 
ral clustering decomposition of this graph does not obey the predicate for cluster 
normality introduced in Section 4.1, due to the broad five-edge connection be- 
tween the two larger parts of the graph. Hence, our separating operators, which 
were developed in the spirit of this predicate, will have a hard time identifying 
the natural decomposition of this graph. (The operators do manage to natu- 
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31 clusters 



42 clusters 



61 clusters 



62 clusters 



88 clusters 



Fig. 15. A hierarchy containing five decompositions of DS4 corresponding to the five 
levels with the highest prominency rank. 



rally decompose this graph if they applied with a relatively large value of A, 

i.e., k > 5.) The multi-scale solution we propose below overcomes this situation 
by constructing a coarse representation similar to that of Figure 2, in which the 
natural decomposition is correctly identified. We can then use the decomposition 
of the coarse graph to cluster the original one, as long as we have a good way of 
establishing a correspondence between the left and right grids of the two graphs, 
respectively. 

Here, now, is a high-level outline of the multi-scale clustering algorithm. 

MS-Clustering {G{V, E,w)) 

1. compute iterated sharpened weights of G’s edges. 

2. if G is small enough then 

return a clustering decomposition of G. 

3. construct G'^ (V^ , E '^ a coarser abstraction of G, such that 

|U*"| = a ■ |U|, where 0 < a < 1. 

4. call MS-Clustering(G"=’(U'=',£;<^,r(;'=')). 

5. obtain a clustering of G by projecting the clustering of onto G. 

6. improve the clustering of G using a greedy smoothing procedure. 

7. end 

6.2 Structure Preserving Coarsening 

Clearly, the key step in the algorithm is the computation of a coarse graph, 
which we now set out to describe. A common approach to coarsening graphs is 
to use a series of edge-contractions. In a single operation of edge-contraction we 
pick some edge {v,u), and combine the two nodes v and u (‘fine nodes’) into 
a single super-node v U u (‘coarse-node’). In order to preserve the connectivity 
information in the coarse graph, we take the set of edges of to be the union 
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of the sets of edges of v and u. If v and u have a common neighbor t, the weight 
of the edge {v U u, t) is taken to be w{v, t) + w{u, t). 

A coarse graph is only useful to us if it retains the information related to 
the natural clustering decomposition of the original graph. Hence, we seek what 
we call a structure preserving coarsening in which large-enough natural clusters 
of the original graph are preserved by the coarsening. A key condition for this 
is that a coarse node does not contain two fine nodes that are associated with 
different natural clusters; or, equivalently, that we do not contract a separating 
edge. 

To achieve this, we select the edges to be contracted by considering the sharp- 
ened weights of the edges — those obtained by using our separating operators 
— and contract only edges with high sharpened weights. We would like to elim- 
inate the tendency of such a procedure to contract pairs of large nodes whose 
connectivity is high due to their sizes. Accordingly, we normalize the sharpened 
weights by dividing them by some power of the sizes of the relevant nodes. 

The hope is that the kind of wide connections between natural clusters that 
appear in Figure 3 will show up as sets of separators. This is based on the fact 
that connections between sets of nodes that are related to the same cluster should 
be stronger than connections between sets that are related (even partially) to 
different clusters. 

After finding the clustering decomposition of the coarse graph, we deduce 
the related clustering decomposition of the original graph by a simple projection 
based on the inclusion relation between fine and coarse nodes. The projected 
clustering might need refining: When the wide connections indeed exist, it may 
be hard to find the ‘right’ boundary of a natural cluster, and some local mistakes 
could occur during coarsening. We eliminate this problem by adding a smoothing 
phase (line 6 of the algorithm), in which we carry out an iterative greedy process 
of exchanging nodes between the clusters. The exchanges are done in such a way 
that each node joins the cluster that minimizes some global cost-function (we 
have chosen the multi-way cut between all the clusters) . This kind of smoothing 
is similar to what is often done in graph partitioning; see, e.g., [10]. 

6.3 Relationship to Agglomerative Clustering 

The edge contraction operation in our multi-scale clustering method is essentially 
the same as the merging of two clusters in agglomerative algorithms. 

The main difference between the multi-scale method and the agglomerative 
variant introduced in Section 5, is in the use of the separating operators on the 
coarse graphs, not only on the original fine graph: at certain levels of the process 
we sharpen the edge weights by these operators. This way, the operators act upon 
more global properties, which can be identified on the coarser graphs. Another 
advantage of the multi-scale algorithm, is its utilization of the smoothing process, 
which can undo erroneous merges. 

We have found this multi-scale algorithm superior for the task of image seg- 
mentation. A common approach to image segmentation is to represent the image 
by a weighted graph whose nodes are the pixels of the image. There are edges 
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connecting each pixel with its four immediate neighbors, and the weight of an 
edge is determined by the similarity of the intensity levels of the incident pixels. 
Figure 16 shows the ability of the multi-scale algorithm to accurately separate 
the two vases, in spite of the large connectivity between them. We are still in 
the process of investigating the use of our ideas in image segmentation, and we 
expect to present additional results. 





(a) (b) 

Fig. 16. (a) original 350 x 350 image (taken from [10]); (b) segmentation: each vase 
forms its own segment 



7 Related Work 

Random walks were first used in cluster analysis in [4]. However, the properties 
of the random walk there are not computed deterministically, but by a random 
algorithm that simulates a 0{n^) random walk. This results in time and space 
complexity of 0{n^) and 0(n^), respectively, even on bounded degree graphs. 

A recent algorithm that uses deterministic analysis of random walks for clus- 
ter analysis is that of [13]. The approach there is quite different from ours. Also, 
its time and space complexity appear to be and 0{n^), respectively, even 

for bounded degree graphs. 

A recently published graph-based approach to clustering, aimed at overcom- 
ing the limitations of agglomerative methods, is [9]. It is hard for us to assess its 
quality since we do not have its implementation. However, the running time of 
[9], which is 0{nm + nlogn -|- m^logm) for m ^ 0.03n, is slower than ours. 

Finally, we mention [3], in which an agglomerative clustering algorithm is 
described that merges the two clusters with the (normalized) greatest number 
of common neighbors. To our best knowledge, this is the first agglomerative 
algorithm that considers properties related directly to the structure of the graph. 
Our work can be considered to be a rather extensive generalization of this work, 
in the sense that it considers weights of edges and adds considerations related 
to larger neighborhoods. 
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8 Conclusion 

The process of using random-walk-based separating operators for clustering 
seems to have a number of major advantages. One advantage is in the qual- 
ity of the resulting clustering. Our algorithms can reveal clusters of any shape 
without a special tendency towards spherically shaped clusters or ones of similar 
sizes (unlike many clustering algorithms that tradeoff these features for being 
robust against outliers) . At the same time, the decisions the algorithms make are 
based on the relevant structure of the graph, making them essentially immune 
to outliers and noise. 

Another advantage is the running time. Our separating operators can be 
applied in linear time when the graphs are of bounded degree, and their running 
time in general is very fast. We have been able to cluster 10,000-node planar 
graphs in less than two seconds, on a 700MHz Pentium III PC. 

In addition to developing the algorithms themselves, we have also attempted 
to define a rather general criterion for the naturalness of a cluster (Section 4.1). 
We hope to use this criterion in the future as the basis of improved algorithms, 
and to better study the connections between it and random-walk-based separa- 
tion. 

Finally, we believe that the structure preserving coarsening introduced in 
Section 6 can be used to improve other algorithms that perform coarsening on 
structured graphs, e.g., multi- scale graph drawing algorithms and multi-level 
graph partitioning algorithms [10]. 



References 

1. V. Estivill-Castro and I. Lee, “AUTOCLUST: Automatic Clustering via Boundary 
Extraction for Mining Massive Point- Data Sets” , 5th International Conference on 
Geocomputation, GeoCompntation CD-ROM: GC049, ISBN 0-9533477-2-9. 

2. Y. Gdalyahu, D. Weinshall and M. Werman, “Stochastic Image Segmentation by 
Typical Guts”, Proceedings IEEE Conference on Computer Vision and Pattern 
Recognition, 1999, pp. 588-601. 

3. S. Guha, R. Rastogi and K. Shim, “ROCK: A Robust Clustering Algorithm for 
Categorical Attributes”, Proceedings of the 15th International Conference on Data 
Engineering, pp. 512-521, 1999. 

4. L. Hagen and A. Kahng, “A New Approach to Effective Circuit Clustering”, Pro- 
ceedings of the IEEE/ ACM International Conference on Computer-Aided Design, 
pp. 422-427, 1992. 

5. D. Harel and Y. Koren, “Clustering Spatial Data using Random Walks”, Proc. 7th 
ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD-2001), 
ACM, pp. 281-286, 2000. 

6. D. Harel and Y. Koren, “Clustering Spatial Data Using Random Walks”, 
Technical Report MCSOl-08, Dept, of Computer Science and Applied 
Mathematics, The Weizmann Institute of Science, 2001. Available at: 
WWW. wisdom. weizmann. ac . il/reports .html 

7. A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice Hall, En- 
glewood Cliffs, New Jersy, 1988. 




On Clustering Using Random Walks 



41 



8. A. K. Jain, M.N. Murty and P.J. Flynn, “Data Clustering: A Review”, ACM 
Computing Surveys, 31 (1999), 264-323. 

9. G. Karypis, E. Han, and V. Kumar, “CHAMELEON: A Hierarchical Clustering 
Algorithm Using Dynamic Modeling”, IEEE Computer, 32 (1999), 68-75. 

10. G. Karypis and V. Kumar, “A Fast and High Quality Multilevel Scheme for Par- 
titioning Irregular Graphs”, SIAM Journal on Scientific Computing 20:1 (1999), 
359-392. 

11. E. Sharon, A. Brandt and R. Basri, “Fast Multiscale Image Segmentation”, Pro- 
ceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 70-77, 
2000 . 

12. B. Stein and O. Niggemann, “On the Nature of Structure and its Identification”, 
Proceedings 25th Workshop on Graph- Theoretic Concepts in Computer Science, 
LNCS 1665, pp. 122-134, Springer Verlag, 1999. 

13. N. Tishby and N. Slonim, “Data Clustering by Markovian relaxation and the Infor- 
mation Bottleneck Method” , Advances in Neural Information Processing Systems 
13, 2000. 




An Introduction 

to Decidability of DPDA Equivalence 



Colin Stirling 

Division of Informatics 
University of Edinburgh 
cpsSdcs .ed.ac.uk 



1 Introduction 

The DPDA equivalence problem was posed in 1966 [4] : is there an effective pro- 
cedure for deciding whether two configurations of a deterministic pushdown au- 
tomaton (a DPDA) accept the same language? The problem is whether language 
equivalence is decidable for deterministic context-free languages. Despite inten- 
sive work throughout the late 1960s and 1970s, the problem remained unsolved 
until 1997 when Senizergues announced a positive solution [11]. It seems that 
the notation of pushdown configurations, although simple, is not rich enough to 
sustain a proof. Deeper algebraic structure needs to be exposed. The full proof by 
Senizergues, in journal form, appeared earlier this year [12]. It exposes structure 
within a DPDA by representing configurations as boolean rational series, and he 
develops an algebraic theory of their linear combinations. Equivalence between 
configurations is captured within a deduction system. The equations within the 
proof system have associated weights. Higher level strategies (transformations) 
are defined which guide proof. A novel feature is that these strategies depend 
upon differences between weights of their associated equations. Decidability is 
achieved by showing that two configurations are equivalent if, and only if, there 
is a finite proof of this fact. 

I produced a different proof of decidability that is essentially a simplification 
of Senizergues’s proof [14]. It is based on a mixture of techniques developed in 
concurrency theory and language theory. The first step is to view the DPDA 
problem as a bisimulation equivalence problem for a process calculus whose ex- 
pressions generate infinite state transition systems. The process calculus is built 
from determinising strict grammars: strict grammars were introduced by Harri- 
son and Havel [5] because they are equivalent to DPDA. Tableaux proof systems 
have been used to show decidability of bisimulation equivalence between infinite 
state processes, see, for instance, [8,2]. I use this method for the DPDA prob- 
lem. However, the tableau proof system uses conditional proof rules that involve 
distances between premises. Essentially this is Senizergues’s use of weights, and 
the idea was developed from trying to understand his proof. 

The proof of decidability is unsatisfactory. It is very complex because the 
proof of termination uses a mechanism for “decomposition” that in [14] is based 
on unifiers and auxiliary recursive nonterminals (from [3,13]). Senizuergues uses 
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a more intricate mechanism. This means that the syntax of the starting process 
calculus has to be extended in [14] with auxiliary symbols. It also introduces 
nondeterminism into tableaux with the consequence that the decision procedure 
(in both [12,14]) is two semidecision procedures. The result is that there is no 
known upper bound on complexity. 

In this paper I describe a simpler decision procedure that should lead to an 
elementary complexity upper bound. It is a deterministic procedure that avoids 
the decomposition mechanism for termination (the rule CUT in [14] and the 
transformation Tq in [12]). Instead, there is a new and much simpler analysis of 
termination. It also means that the syntax of the starting process calculus is not 
extended. The paper is entirely introductory, and contains no proofs. Section 2 
introduces the DPDA problem as a bisimulation equivalence problem. Section 3 
describes some features of the process calculus in more detail. Finally, Section 4 
introduces the deterministic tableau proof decision procedure. The aim of the 
paper is to provide the reader with a clear indication of the decision procedure: 
more details, and full proofs, can be found at the author’s home page. 

2 DPDA and Strict Grammars 

A deterministic pushdown automaton, a DPDA, consists of finite sets of states 
P, stack symbols S, alphabet A and basic transitions T. A basic transition is 
pS — ^ qa where p, q are states in P, a G A U {e}. S' is a stack symbol in S and 
a is a sequence of stack symbols in S*. Basic transitions are restricted. 

if pS — ^ qa gT and pS — ^ r(3 gT and a G A U {e}, then q = r and a = fi 

if pS — ^ ga G T and pS — ^ rA G T then a = e 

A configuration of a DPDA has the form p6 where p G P is a state and i5 G S* is 
a sequence of stack symbols. The transitions of a configuration are determined 
by the following prefix rule, assuming that /3 G S*: if pS — > pa G T then 
pSP — ^ qap. 

The transition relation — >, a G A U {e}, between configurations is extended 
to words w G A* . First, pa — ^ Pna„, if p„ = p and a„ = a or there is a 
sequence of basic transitions pa — ^ Piai — ^ . . . — ^ PnOin- li w = av G A+, 
then pa — > qP if pa — > p'a' — > q' P' — > qP. A configuration pSa is either 
“stable” and has no e-transitions or it is “unstable” and only has a single e- 
transition. We assume that a final configuration pe with the empty stack is also 
stable. Clearly, if pa qP and pa — ^ r6 and qP and rS are stable, then 
q = r and P = S. The language accepted, or generated, by a configuration 
pS, written L(pi5), is the set of words {w G A* : 3q G P.p6 pe}. That 
is, acceptance is by empty stack and not by final state. The motivation for 
providing a positive solution to the decision problem is to establish decidability 
of language equivalence between deterministic context-free languages. However, 
DPDAs which have empty stack acceptance can only recognise the subset of 
deterministic context-free languages that are prefix-free: a language L is prefix- 
free ii w G L, then no proper prefix of w is also in L. However, DPDAs whose 
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acceptance is by final state do recognise all deterministic context-free languages. 
A DPDA with acceptance by final state has an extra component F C P that 
is the subset of accepting states: in which case, L(pa) is the set of words {w : 
pa qf3 and q G F}. For any determinisitic context-free language L, there is 
a stable configuration of a DPDA with empty stack acceptance that accepts the 
language {w% : w € L} where $ is a new alphabet symbol, an end marker. 

The DPDA problem is whether \-{pa) = L(g/3). Clearly, it is sufficient to 
restrict the problem to stable configurations. Moreover, one can assume that the 
DPDA is in normal form: if pS — ^ qa G T, then |a| < 2; if pS — ^ go G T then 
a = e; and there are no redundant transitions in T (a transition pS — ^ qa is 
redundant if \-{qa) = 0). 

Example 1. Let P = {p, r}, S = {A, Y} and A = {a, b, c}. The basic transitions 
T are: pX pX, pX — ^ pe, pX — ^ pX, rX — ^ pt, pY — ^ pe, pY — ^ re, 
pY — ^ pYY and rY — ^ re. This example of a DPDA is in normal form. □ 

G{pa) is the possibly infinite state determinisitic transition graph gener- 
ated by a stable configuration pa that abstracts from the basic e-transitions. 
Transitions in G{pa) are only labelled by elements of A and only relate stable 
configurations: qj3 — ^ rS is a transition if both configurations are stable and 
q/3 — ^ q'P' — ^ rS. The graph G{pYX) is pictured in Figure 1, where pYX is 
a configuration from Example 1. There is a path pa Piai p„e 




in G{pa) if, and only if, oi . . . a„ G L{pa). Because these graphs are determinis- 
tic and there are no redundant transitions, language equivalence coincides with 
bisimulation equivalence. 

There is not an obvious relation between the lengths of stacks of equivalent 
configurations^: in the case of Example 1, L(pF"A) = L{pY'^X) for every m and 
n. Techniques for proving decidability of bisimulation equivalence, as developed 

^ A main attack on the decision problem in the 1970s examined differences between 
stack lengths and potentially equivalent confignrations that eventually resulted in a 
proof of decidability for real-time DPDAs, that have no e-transitions, [10]. 
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in the 1990s [1], use decomposition and congruence that allows substitutivity 
of subexpressions in configurations. However, L(pF”) = L(pF™) only if n = m. 
Moreover, the operation of stack extension is not a congruence. An important 
step is to provide a syntactic representation of stable DPDA configurations that 
dispenses with e-transitions and that supports congruence, a process calculus 
that can directly generate transition graphs such as G{pYX). The key is non- 
deterministic pushdown automata with a single state and without e-transitions, 
introduced by Harrison and Havel [6] as grammars, and further studied in [5,7]. 

Because the state is redundant, a configuration of a pushdown automaton 
with a single state is a sequence of stack symbols. Ingredients of such an automa- 
ton without e-transitions, an SDA, are finite sets of stack symbols S, alphabet 
A and basic transitions T. Each basic transition has the form S — ^ a where 
a G A, S' is a stack symbol and a is a sequence of stack symbols. A configuration 
of an SDA is a sequence of stack symbols whose transitions are determined by 
the prefix rule, assuming /3 G S*: if S a G T, then Sf3 a/3. The language 
L(a) accepted, or generated, by a configuration a is the set {w G A* : a e}, 
so acceptance is again by empty stack. Unlike pushdown automata with multiple 
states, language (and bisimulation) equivalence is a congruence with respect to 
stacking: if L(a) = L(/3), then L(ai5) = L(/3(5). An SDA can be transformed into 
normal form: if S — ^ a G T, then jaj < 2 and L(a) yf 0. 

Any context-free language that does not contain the empty word e is genera- 
ble by an SDA, and so the language equivalence problem is undecidable. However, 
if the SDA is deterministic, then the decision problem is decidable. A determin- 
istic SDA, more commonly known as a “simple grammar”, has restricted basic 
transitions: if S a G T and S /3 G T, then a = (3. Decidability of 
language equivalence between configurations of an SDA was proved by Koren- 
jak and Hopcroft in 1966 [9]. However, the languages generable by deterministic 
SDA are strictly contained in the languages generable by DPDA: for instance, 
{a" 6”+^ : n>0}U{a”c : n>0}is not generable by a deterministic SDA. 

Instead of assuming determinism, Harrison and Havel included an extra com- 
ponent, =, an equivalence relation on the stack symbols S, in the definition of 
an SDA that partitions S into disjoint subsets. 

Example 2. The following SDA has alphabet A = {a, 6} and stack symbols S = 
{A,C,X,Y}. The partition of S is {{A}, {C}, {A"}, {F}}. The basic transitions 

T are X ^ YX, X e, Y X, A C, A e and C AA. This 
SDA is deterministic. □ 

Example 3. The set S = {X,Y,Z}, A = {a,b,c} and T contains X X, 
X X X, Y e, Y YY, Z e, Z ^ Z and Z ^ 

YZ. The partition of S is {{X},{Y,Z}} which means that, for example, Y = 
Z. The graphs of X and Z are illustrated in Figure 2. In particular, G(Z) is 
nondeterministic because there are two c-transitions from Z. □ 

The relation, =, on S is extended to an equivalence relation between se- 
quences of stack symbols, and the same relation, =, is used for the extension: 
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Fig. 2. The graph G(X) and G{Z) 



a = (3 if, either a = P, or a = SXa' and P = SY P' and X = Y and X ^ Y . An 
instance of equivalent sequences, from Example 2, above, is XXYY = XXZ 
because Y = Z. Some simple properties of = are: a/3 = a if, and only if, P = e; 
a = P if, and only if, Sa = 6P; if a = P and j = S, then ay = pS; if a = P and 
a P, then ay = P5] if ay = P6 and |a| = \P\, then a = p. 

Definition 1. The relation = on S is strict when the following two conditions 
hold. (1) If A = y and X — ^ a and Y — ^ P, then a = p. (2) If X = Y and 
X a and Y — ^ a, then X = Y . 

An SDA with partition = is strict deterministic (or, just strict) if the relation 
= on S is strict^. Examples 1 and 2, above, are strict. In the case of Example 1, 
each partition is a singleton set and hence by (1) of Definition 1 this implies 
determinism. In this extreme case, it follows that a = /3 if, and only if,a = P 
and, therefore, an SDA is then a simple grammar. If the partition involves larger 
sets, as is the case in Example 2, then constrained nondeterminism is allowed. 
There are transitions Z — ^ Z and Z — ^ YZ. However, Z = YZ because 
Y = Z. 

Proposition 1. (1) If a a' and P P' and a = P then a' = P' . (2) If 
a a' and P a' and a = P then a = p. (3) // a = /? and w € L(a), 
then for all words v, and a € A, wav ^ L(/3). (4) If a = P and a P, then 
L(a) n L(/3) = 0. 

The definition of a configuration of an SDA is extended to sets of sequences 
of stack symbols, {ai, . . . ,a„}, written in sum form ai + . . . + a„. Two sum 
configurations are equal, written using =, if they are the same set. A degenerate 
case is the empty sum, written 0. The language of a sum configuration is defined 
using union: L(ai + . . . + a„) = IJ {L(aj) : 1 < i < n}. 

Definition 2. A sum configuration Pi + . . . + Pn is admissible, if Pi = Pj for 
each pair of components, and Pi yf Pj when i j. 

^ More generally, an SDA without a partition is strict deterministic if there exists a 
strict partition of its stack symbols. Harrison and Havel show that it is decidable 
whether an SDA is strict deterministic [6]. 




An Introduction to Decidability of DPDA Equivalence 



47 



The empty sum, 0, is therefore admissible. In [7] admissible configurations are 
called “associates”. Some admissible configurations of Example 2, above, are 
XX, ZZZ + ZZy, YX + Z, Z + YZ and Z + YZ + YYZ. An example of 
a configuration that is not admissible is X + X because X ^ X. A simple 
corollary of Proposition 1 is that admissibility is preserved by word transitions: 
if {f3 \, . . . , /?„} is admissible, then {j3' : Pi P', 1 < i < n} is admissible. 

A strict SDA can be determinised, by determinising the basic transitions 
T to T'^: for each stack symbol X and a G A, the transitions X — ^ ai, . . ., 
X — ^ a„ in T are replaced by the single transition X — ^ oi + . . . + «„ in 
T''. The sum configuration oi + . . . + is admissible. Therefore, for each stack 
symbol X and a G A there is a unique transition X — > ^ai G T*^ . The prefix 
rule for generating transitions is also extended to admissible configurations: if 
Xi/3i + . . . + XjnPm is admissible and Xj ^ G T'^ for each i, then 
XiPi + . . . + XmPm X) oiijPi + . . . + '^amjPm- The resulting configuration is 
admissible. Given a determinised strict SDA and an admissible configuration E, 
the graph G^{E) is the transition graph generated by E, except that redundant 
transitions, E' — ^ 0, are omitted. 

Example 4. In the case of Example 3, above, T'* contains: X X, X — % e, 
X ^ 0, X e, X 0, X e, X ^ X, X ^ XX and X ^ XX + X. 
The graph G‘^(XX + Z) is pictured in Figure 3. □ 



a a a 




Fig. 3. The graph G‘‘(XX + Z) 



Admissible configurations of strict SDAs generate exactly the same languages 
as configurations of DPDA with empty stack acceptance [6] . There is a straight- 
forward transformation of a DPDA into a strict SDA, and DPDA configura- 
tions into admissible SDA configurations that preserve language equivalence. 
Assume a DPDA in normal form with sets P, S, A and T. An SDA is con- 
structed, in stages. A) For p,q gP and X G S, introduce an SDA stack symbol 
[pXq\. B) For transitions, the initial step is to define the following for a G A: 
if pX qe G T, then [pXq] e; if pX qY G T, then [pXr] [qYr] 

for each r G P; if pX — ^ qY Z G T, then [pXr\ [qYp'\[p' Zr\ for each 
r and p' in P. C) [pSq\ is an e-symbol, if pX — ^ qe G T. All e-symbols are 
erased from the right hand side of any transition given in B. D) Finally, the 




48 



Colin Stirling 



SDA is normalised. Clearly, the relation [pSq] = [pSr] is strict. Although the 
transformation does not preserve determinism, this is overcome by determinis- 
ing the SDA. Any configuration pX\X 2 ■ ■ ■ AT„ of the DPDA is transformed into 
sum(pa) = [Pi^ 2P2] • ■ • [Pn-iXnPn] where the summation is over all 

Pi G P, after all e-symbols are erased, and all components involving redundant 
stack symbols are removed. And L(pa) = L(sum(pa)). An example is the conver- 
sion of the DPDA of Example 1. The resulting strict SDA is Example 4, above, 
when X = [pAp], Y = [pYp] and Z = [pEr]. The configuration pYYX trans- 
forms to [pYp][pYp][pXp] + [pEp][pEr] -|- [pYr], Harrison and Havel also prove 
the converse, that any strict SDA can be transformed into a DPDA [6]. The 
DPDA problem is equivalent to language equivalence between admissible con- 
figurations of a determinised strict SDA (which is the same as the bisimulation 
equivalence problem). 



3 Heads and Tails 

Assume a fixed determinised strict SDA in normal form with ingredients S, 
A, Td and =. We assume a total ordering on A, and we say that word u is 
shorter than v if |u| < |u| or |m| = |u| and u is lexicographically smaller than 
V. Let E, F,G, . . . range over admissible configurations, and E = F ii they are 
the same set of sequences. A useful notation is “the configuration E after the 
word m” , written E ■ u, that is the unique admissible configuration F such that 
E — ^ F, which can be 0. The language accepted by configuration E, \-{E), is 
{u : {E ■ u) = e}. Two configurations E and F are language equivalent, written 
E ^ F, \i they accept the same language, \-{E) = L(F). Language equivalence 
can also be approximated. If n > 0, then E and F are n-equivalent, written 
E F, provided that they accept the same words whose length is at most n: 
for all words w such that |ru| < n, (A • w) = e if, and only if, (F ■ w) = e. 

Proposition 1. (1) E ^ F if, and only if, for all n > 0, E F. (2) If 
E 'Z' F, then there is an n > 0 such that E F and E Zn+i F- (3) E ^ F if, 
and only if, for all u € A*, {E ■ u) ^ {F ■ u). (4) A F if, and only if, for all 
u € A* where |w| < n, {E-u) {F-u). (5) If E F and 0 < m < n, then 

E F. (6) If E^nF and F Zn G, then E /„ G. 

Definition 1. For each stack symbol X, the word w{X) is the shortest word 
in the set {u : (A • u) = e}. 

A feature of the decision procedure is repeating patterns within admissible 
configurations. An admissible configuration is written in sum form Pi + . . . + 
where each Pi is distinct. The operation -|- can be extended: if E and F are 
admissible and if UE is admissible and E, F are disjoint, EC\F = %, then E + F 
is the admissible configuration E U F. The operation -|- is partial. Sequential 
composition, written as juxtaposition, is also used: if E and F are admissible, 
then EE is the configuration {Pj : P G E and 7 G F}, that is admissible. Some 
properties are: if E -|- E is admissible and u G L(E), then uv Z- L(E); if E -|- E is 
admissible, then L(E)nL(E) = 0; L(EE) = {uv : u G L(E) and v G L(E)}. Also, 
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the following identities hold: E + % = E = ^ + E, E% = 0 = 0if, Ee = E = eE, 
{E + F)G = EG + EG and G{E + F) = GE + GF. Admissible configurations 
can have different “shapes”, using + and sequential composition. 

Definition 2. E = EiG\ + . . . + EnGn is in head/tail form, if the head E\ + 
. . . + En is admissible and at least one Ei yf 0 , and each tail Gi yf 0 . 

Example 1. E = YYYX + YY Z + Y Z + Z is an admissible configuration 
of Example 4 of the previous section. The partition of the stack symbols is 
{{X}, {y, Z}}. E has head/tail form, YG\ + ZG 2 , where G\ = YYX + YZ + Z 
and G 2 = e. Also, E has head/tail form, YYHi + YZH 2 + ZEl^, where Hi = 
YX + Z and H 2 = H 3 = e. {E ■ c) is {YY ■ c)Hi + {YZ ■ c)H 2 + {Z ■ c)H 3 = 
YYYHi + YYZH 2 + {YZ + Z)H 3 . E cannot be presented as YYG{ + YYG'^ + 
Y G 3 + ZG 4 : this is not a valid head/tail form because the head YY +YY +Y +Z 
is not admissible {YY ^ Y) and it is not disjoint {YY + YY is not a proper 
sum) . □ 

In the following, if a configuration E is presented as EiGi + . . . + E„G„, then 
assume that it fulfills the conditions of Definition 2 of a head/tail form. The fol- 
lowing result lists some properties of head/tail forms. Language equivalence and 
its approximants are congruences with respect to -I- and sum. Consequently, 
head/tail forms allow substitutivity of equivalent subexpressions into tails (be- 
cause admissibility is preserved). 

Proposition 2 . Assume E = EiG\ -|- . . . -I- EnGn- (1) If {Ei ■ u) = e, then for 
all j yf i, {Ej • w) = 0 and {E ■ u) = Gi- (2) If {Ei • w) yf 0, then {E ■ u) = 
{El ■ u)Gi -I- ... -I- {En ■ u)Gn- (3) If Hi ^ th, I < i < n, then EiHi -|- . . . -I- EnHn 
is a head/tail form. (4) If each Hi and each Ei ^ e and for each j such that 
Ej yf 0 Hj Gj, then E EiHi -I- ... -I- (5) If Hi ~ Gi, 1 < i < n, 

then E ^ EiHi -t- . . . -t- EnHn- 

Two configurations may have the same heads and differ in their tails, or 
may have the same tails and differ in their heads. If E has the head/tail form 
EiGi -I- ... -I- EnGn and F has a similar head/tail form FiGi -|- . . . -I- EnGn 
involving the same tails^, then the imbalance between E and F, relative to this 
presentation, is max{ \Ei\, |E)| : 1 < i < n}. If the imbalance is 0, then they 
are the same configurations. 

Definition 3. If if = EiGi -I- . . . -I- EnGn and F = FiHi -|- . . . -I- FmHm, then F 
in its head/tail form is a tail extension of E in its head/tail form provided that 
each Hi = K\Gi -I- ... -I- K^Gn, 1 < i < m. When F is a tail extension of E, 
the associated extension e is the m-tuple {K\ -|- . . . -I- Kf , . . . , iL™ -|- . . . -I- Kff) 
without the G^s, and F is said to extend E by e. 

Extensions are matrices. If E" extends E' by e and E' extends E by /, 
then E" extends E by ef (“matrix multiplication”). A special instance of an 
extension occurs when the tails are the same. If F = FiGi -I- . . . -I- F„G„ and 

® Any pair of configurations E and F have a head/tail form involving the same tails: 
F = EG and F = EG when G = e. 
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F = FiG\ + . . .+F„G„, then F extends Ehy e= (£+0+. . .+0, . . . , 0+0+. . .+e). 
The extension e is abbreviated to the identity (e). 

Example 2. The following uses Example 4 of the previous section. E = YG\ + 
ZG 2 where G\ = X and G 2 = e. E' = Y G'l + ZG '2 where G'l = Y X + Z and 
G '2 = s. E" = Y G'l + ZG'l where G" = YY X + Y Z + Z and G'l = e. E' extends 
if by e = {Y + Z,% + e) and E" extends E' hy f = e = {Y + Z,% + e). Therefore, 
E" extends E hy ef = (YY + {Y Z + Z), 0 + e). □ 

4 The Decision Procedure 

The procedure for deciding if ~ E is to build a goal directed proof tree, a 
tableau, with initial goal E = E, “is E ~ F?”, using proof rules that reduce 
goals to subgoals. There are just three rules, presented in Figure 4. UNF, for 
“unfold”, reduces a goal E = F to subgoals {E ■ a) = {F ■ a) for each a. It is 
complete and sound. If the goal is true, then so are all the subgoals. Soundness 
is the converse. A finer version, (2) of Fact 1, uses approximants: if the goal fails 
at level to + 1, then at least one subgoal fails at level to. 



UNF 



E = F 



(F ■ ai) = (F ■ ai) ... (F ■ Uk) = (F ■ Uk) 



A = {fli, . . . ,Uk} 



BAL(R) 



BAL(L) 



F ^XiHi + ... + Xkffk 



E_ 

'f' 



C 



F\H\ + . . . + FkHk 

Fi{F ■ w{Xi)) + . . . + Fk{F ■ w{Xk)) 



X1H1 + ...+XkHk = F 



: C 

FiHi + ... + FkHk = F' 

Fi{F ■ w{Xi)) + . . . + Fk{F ■ w{Xk)) = F' 

where C is the condition 



1. Each Fi ^ e and at least one Hi ^ e. 

2. There are precisely max{ |w(Ai)| : Fi ^ tji for 1 < * < fc} applications of UNF 
between the top goal and the bottom goal, and no application of any other rule. 

3. If u is the word associated with the sequence of UNFs, then Fi = (Xi ■ u) for each 
i : 1 < i < k. 



Fig. 4. The tableau proof rules 
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Fact 1. (1) If E ^ F and a £ A, then {E ■ a) ^ {F ■ a). (2) If E /m+i P, then 
for some a £ A, {E ■ a) '/'m {F ‘ P)- 

Example 1. Below is an application of UNF where X, Y and Z are from 
Example 4 of Section 2. 



YX + Z = YYX YYZYZ 

X = YX + Z e = e YYX + YZ + Z = YYYX + FEZ + YZ + Z 
The three subgoals are the result after a, b and c. □ 

If E' = F' is a subgoal that is a result of m consecutive applications of UNF 
(and no other rule) to E = F, then there is an associated word u such that 
|t6| = m and E' = {E ■ u) and F' = {F ■ u) . The other two rules in Figure 4 are 
conditional that involve two premises: the second premise goal reduces to the 
subgoal beneath it provided that the first premise is above it (on the path back 
to the root goal). They are BAL rules, for “balance”, involving substitution of 
subexpressions, that allow a goal to be reduced to a balanced subgoal where the 
imbalance between the configurations is bounded. 

Example 2. An application of BAL(L) uses stack elements of Example 2 of 
Section 2. 



XATXXATA: = AAAAAA 

UNF 

YXXXXXX = CAAAAA ■ ■ ■ 

BAL(L) 

YXAAAAA = CAAAAA 

The second goal is the result of UNF when the label is a (and the other subgoal 
for b is omitted). w{X) = 6, so m = 1. Therefore, BAL(L) applies to the second 
goal: Xi = X, Hi = XXXXX, Ei = YX and F = AAAAAA. So Hi is 
replaced with {F ■ b) = AAAAA. The imbalance between configurations of the 
last goal is 2. □ 

An application of BAL is said to use F, if F is the configuration in the initial 
goal of the rule, see Figure 4. The BAL rules are sound and complete. Complete- 
ness is straightforward. Soundness is more intricate. First, “global” soundness of 
the proof system is explained. If there is a successful tableau whose root is false, 
then there is a branch of the tableau within which each subgoal is false. The 
idea is refined using approximants. If the root is false then there is an offend- 
ing branch (of false goals) in the tableau within which the approximant indices 
decrease whenever rule UNF has been applied. Soundness of an application of 
BAL is that if the two premise goals belong to an offending branch, then the 
subgoal preserves the level of falsity of the second premise goal. 

Proposition 1. (1) IfXiHi + . . ,+XkHk ~ F and EiHi + . . ,+EkHk ~ F' , then 
Ei{F-w{Xi)) + ... + Ek{F-w{Xk)) - F'. (2) If XiHi + . . . + X^H^ F and 
EiHi -I- . . . + EkHk '/'n+i F' and each Ei ^ e and m > max{|'u;(Xi)| : Ei ^ 0}, 
then Ei{F ■ w{Xi)) -I- ... -I- Ek{F ■ w{Xk)) /„+i F'. 
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In Example 2, above, BAL(L) is applied after UNF. However, the other two 
rules UNF and BAL(R) also apply. It is intended that there be a unique tableau 
associated with any initial goal. So restrictions will be placed on which rule is to 
be applied when. First, the initial premise of a BAL is the one that is “closest” 
to the goal and, therefore, the one that involves the least number of applications 
of UNF. To resolve which rule should be applied, the following priority order 
is assumed: (1) if BAL(L) is permitted, then apply BAL(L), (2) if BAL(R) is 
permitted, then apply BAL(R), (3) otherwise, apply UNF. However, whether 
an application of BAL is permitted involves more than fulfillment of the side 
condition. It also depends on the previous application of a BAL. 

Initially, either BAL is permitted provided that its side condition is true. If 
an application of BAL uses F, then the resulting goal contains the configuration 
Ei{F ■ w{Xi)) + . . . + Ek{F ■ w{Xk)). Ei is a “top” of the application of BAL 
and {F ■ w{Xi)) is a “bottom”. Assume an application of BAL(L). A subsequent 
application of BAL(L) is permitted provided the side condition of the rule is 
fulfilled. However, BAL(R) is not permitted until a bottom of the previous ap- 
plication of BAL(L) is exposed and the side condition of the rule is true. Between 
the application of BAL(L) and the goal Gi = Hi, below, 

F 

: BAL(L) 

Ei{F ■ w{Xi)) + . . . + Ek{F ■ w{Xk)) = H 

: : UNFs 

{F ■ w{Xi)) = Gi = Hi 

Gk = Hk 

there are no other applications of BAL(L), and Gi is a bottom, {F ■ w{Xi)), of 
the previous application of BAL(L). BAL(R) is now permitted provided it uses 
configuration Gi, i > 1, and the side condition holds. BAL(R) is not permitted 
using a configuration from a goal above Gi = Hi, even when the side condition 
is true. The strategy is to apply a BAL rule whenever it is permitted, and if both 
BAL rules are permitted, then priority lies with BAL(L). If BAL(R) is applied, 
then the strategy is to repeatedly apply BAL(R), and to use UNF otherwise. 
BAL(L) is only permitted once a bottom of the previous application of BAL(R) 
becomes the right hand configuration of a goal and the side condition holds. The 
consequence is that when building a tableau proof tree, there is just one choice 
of which rule to apply next to any subgoal. 

Example 3. An initial part of the tableau, continuing on from Example 2 above, 
is in Figure 5. At goal (*), BAL(L) is applied. Either of the premises (1) and 
(2) could be the initial premise for the application: however, by the discussion 
above it is the lower premise (2). □ 
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XX® = AA® 

UNF 

yxx® = CA® X® = A® 

BAL(L) 

(1) yXA® = CA® 

UNF 

0 = 0 (2) XXA® = AA® 

UNF 

(*) yXXA® = CA® XA® = A® 

BAL(L) 

yXA® = CA® 



Fig. 5. Part of a tableau 



Example 4. Below is the initial part of the tableau for Example 1, above. 



(*) YX + Z = YYX + YZ + Z 

UNF 

(1) e = e YYX + YZ + Z = YYYX + YY Z + YZ + Z 

BAL(L) 

YYYX + YYZ + YZ + Z = YYYX + YYZ + YZ + Z 



where (1) is the subtableau 



(**) X = YX + Z 

UNF 

X = X e = e X = YYX + UX + X 

BAL(R) 

X = YYX + yx + X 

UNF 

X = YX + Z e = e X = YYYX + YYZ + UZ + X 

BAL(R) 

X = YYX + yz + Z 



The premise (*) is the initial premise for the application of BAL(L), and (**) 
is the initial premise for the first BAL(R). The leaf goals are either identities 
or repeats. In fact, it will turn out that this partial tableau is the completed 
successful tableau that establishes that L(pyX) = L(pyyX) of Example 1 of 
Section 2. □ 

For the tableau construction to be a decision procedure, a notion of final goal 
is needed so that a tableau can be terminated. The tableau proof rules are locally 
complete, if a goal is true then so are subgoals. Consequently, if an obviously 
false subgoal is reached, then the root goal is also false. So the criterion for being 
an unsuccessful final goal is that it is obviously false. This occurs when the goal 
has the form 0 = if or if = 0 and if yf 0. The tableau proof rules are also locally 
sound, if all the subgoals are true then so is the goal. Therefore, if an obviously 
true subgoal, if = if, is reached then it should count as a successful final leaf. 
However, the tableau proof rules are sound in a finer version. In the case of 
UNF, if the goal is false at level m + 1, if /m+i F, then at least one subgoal 
fails at level m, (if • a) /m {F ■ a). And applications of BAL preserve the falsity 
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index. Consequently, if a subgoal E = F is repeated in a branch, and there is 
at least one application of UNF between them, then the second occurrence of 
E = F can also count as a successful final goal. If the root of the tableau is 
false, then there is an offending path of false goals in the tableau within which 
the approximant indices decrease whenever UNF is applied. Consider the branch 
with E = F occurring twice: if this were an offending branch, then at the first 
occurrence, there is a least n > 0 such that E F and E F. Therefore, 
at the second occurrence E ')0(^n+i)-k F where k is the number of applications 
of UNF between the two; this is a contradiction when A: > 1. 

A repeat is an instance of a more general situation where goals may be 
growing in size, formally captured below by the “extension theorem” . Roughly 
speaking, in a branch if there are goals where the rates of change of tails are 
repeating, then there is a successful final goal. A repeat is an instance when the 
rate of change is zero. In a long enough branch with multiple applications of 
BAL, there must be goals within the branch that have the same heads. The idea 
is to discern patterns of relations between their tails. Definition 3 of the previous 
section is lifted to goals. Assume E = EiEli + . . .+A„iJ„, F = FiHi + . . .+FnH„, 
E' = E[Gi + . . . + E'^Gm and F' = F{Gi + . . . + F^Gm and goal h is E = F 
and goal g is E' = F' . Goal h extends g by extension e, if E extends E' by e. A 
goal E = F is true at level m if E F. 

Theorem 1. [The extension theorem] Assume there are two families of goals 
g{i), h{i), 1 < z < 2”, and each goal g{i) has the form EiG\ + . . . + E^G]^ = 
FiG\ + . . . + FnGl^ and each goal h{i) has the form EiH[ + . . . + EnFlf = 
FiH\ + ... + FnFlf. Assume extensions ei,...,e„ such that for each ej and 
* > 0; + 2^“^ + 1) extends g{2H + 2^ ~^) by ej and h{2H + 2^~^ + 1) extends 

h{2H + 2^~^) by Cj. If each goal g{i) is true at level m, i : 1 < i < 2”, and each 
goal h{j), j : 1 < j < 2”, is true at level m, then Ai(2") is true at level m. 

A simple instance is explained. Consider the proof tree of Example 3. There is 
a branch where the goals are expanding as follows: VXA^ = GA^, . . ., YXA^ = 
GA®, . . ., YXA^ = GA'^, .... And between these goals there is at least one 
application of UNF. To instantiate the extension theorem n = 1. The families of 
goals are as follows. (/(I) : YXG^ = GG^ where G^ = A®, g(2) = h{l) : YXG^ = 
GG^ where G^ = A®, h{2) : YXH^ = GH^ where = A^. The extension 
is (A): g{2) extends g{l) by (A) and h{2) extends h{l) by (A). The theorem 
provides the following result: for any m, if YXA^ GA® and UAA® 

GA®, then YXA^ GA"^. This justifies that the subgoal YXAf^ = GA^ is a 
successful final goal. The argument is the same as for a repeating goal, above. 

Definition 1. Assume a branch of goals d{0), . . . ,d{l). The goal d{l) obeys 
the extension theorem if there are goals g(i), h{i), 1 < i < 2" and extensions 
ei, . . . , e(n) as described in Theorem 1, and the goals belong to {d{0), . . . d{l)}, 
and /i(2”) is d{l) and there is at least one application of UNF between goal 
/i(2” — 1) and d{l). 
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The second occurrence of a repeating goal in a branch obeys the extension 
theorem if there is at least one application of UNF between the two occurrences. 
Assume it has the form EiG\ + . . . + EnG\^ = FiG\ + . . . + FnG\^. Except for 
the goals g{i) and h{i) are the first occurrence of the repeating goal, and 
each extension is the identity, (e). 

Definition 2. Assume a branch of goals 5(0), . . . ,g{n) where g(0) is the root 
goal. The goal g{n) is a final goal in the following circumstances. 

1. If g{n) is an identity E = E, then g{n) is a successful final goal 

2. If g{n) obeys the extension theorem, then g{n) is a successful final goal 

3. If g{n) has the form E = % or % = E and if yf 0, then g{n) is an unsuccessful 
final goal. 

Lemma 1. In any infinite branch of goals g{0), . . . , g(ji), . . . where g(0) is the 
root goal, there is an n such that g{n) is a final goal. 

The deterministic procedure that decides whether if ~ E is straightforward, and 
is defined iteratively. 

1. Stage 0: start with the root goal g{0), E = F, that becomes a frontier node 
of the branch g(0). 

2. Stage n + 1: if a current frontier node g(ri) of branch (/(O), . . . ,g{n) is an 
unsuccessful final goal, then halt and return “unsuccessful tableau” ; if each 
frontier node g{n) of branch g{0), . . . , g{n) is a successful final goal, then 
return “successful tableau”; otherwise, for each frontier node g(n) of branch 
g{0), . . . , g(ri) that is not a final goal, apply the next rule to it, and the 
subgoals that result are the new frontier nodes of the extended branches. 

Theorem 2. (1) If E F, then the decision procedure terminates with “un- 
successful tableau”. (2) If E ^ F, then the decision procedure terminates with 
“successful tableau”. 

Theorem 2 establishes decidability of language equivalence between DPDA 
configurations. Part 1 of Theorem 2 is straightforward. The other half, part 2, is 
more difficult and uses Lemma 1, above. However, a more refined analysis of the 
lemma should produce an elementary complexity upper bound. Currently, the 
proof of Lemma 1 abstracts from “oscillation” whereby goals can increase and 
decrease their sizes. A more involved proof would establish that a boundedly 
finite branch of goals contains a final goal. 

The proof of Theorem 1, the extension theorem, follows from the following 
much simpler result. 

Lemma 2. If E = EiGi + . . . + E„G„, F = FiGi + . . . + F„G„, E' = EiHi + 
. . . + Enlln and F' = FiHi + . . . + FnH„ and E F and E' F' , then 

there is a word u, |t6| < m, and an i such that either {E' -u) = Hi and {F' ■ u) = 
(El ■ u)Hi + . . . + {Fn ■ u)Hn and {E' ■ u) {F' ‘ u), or {F' ■ u) = Hi and 

\e' ■ u) = {El ■ u)Hi + . . . + {En ■ u)Hn and {E' ■ u) {F' ■ u). 
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Abstract. It is shown that the existence of a set in E that is hard for 
constant depth circuits of subexponential size is equivalent to the exis- 
tence of a true pseudo-random generator against constant depth circuits. 



1 Introduction 

Pseudo-random generators against a class of circuits are functions that take a 
random seed as input and output a sequence of bits that cannot be distinguished 
from a truly random sequence by any circuit in the class. They play an impor- 
tant role in many areas, particularly in cryptography and derandomization (see, 
e.g., [BM84,Yao82]). In this paper, we will be interested in derandomization as- 
pect of pseudo-random generators, and therefore, will use the following definition 
(as given in [NW94]): 

Definition 1. For the class of circuits C, function G is called a {i n) pseudo- 
random generator against C if 

— G = {Gn}n>0 with Gn '■ {0, h- >• {0, 1}", 

— Gn is computable in time 

— for every n, and for every circuit G € C having n input bits, 

I Prob,,g{o^i}„{G(a;) = 1} - probj^6{o,i}^("){G(G„(y)) = 1} |< ^. 

To derandomize a randomized algorithm, one uses a (£ i— 1- n) pseudo-random 
generator against a class of circuits that include the circuit family coding the 
algorithm, and feed the output of the generator as random input bits to the 
algorithm for each value of the seed, and then calculate the fraction of ones 
in the output. Of course, this modified algorithm takes more time — the time 
taken to compute the generator for every seed value times the time to run the 
algorithm on every output of the generator. To minimize the time taken, one 
needs to reduce £{n): the best that can be achieved is i{n) = O(logn) and then 
the increase in time complexity is by a factor of polynomial only. Pseudo-random 
generators that achieve this seed size are called true pseudo-random generators: 
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Definition 2. For the class of circuits C, function G is called a true pseudo- 
random generator against C if G is a {£ n) pseudo-random generator against 
C with £{n) = O(logn). 

While true pseudo-random generators against specific algorithms (i.e., the 
class against which the generator works include circuits for a specific algorithm 
only) are known, very few unconditional pseudo-random generators are known 
against natural classes of circuits. Perhaps the most notable amongst these are 
((logn)*^*^"^^ !->■ n) pseudo-random generators against the class of depth d and 
size n circuits [Nis91]. 

In a seminal work, Nisan and Wigderson [NW94] exhibited a connection 
between pseudo-random generators and hard-to-approximate sets in E: 

Definition 3. For a set A and circuit G with n input bits, let 

advc(A) =1 prob,,g{o^i}„[C'(x) = A(x)] - prob,,g{o^i}„ [C(x) yf A{x)] \ . 

Flere we identify A with its characteristic function. For a size hound s{n) of 
circuits, let adv 5 („)(A) be the maximum o/advc(^) where C varies over all size 
s{n) circuits. 

Set A & F is hard-to-approximate by circuits of size s{n) if a,dvs{n)(A) < 

1 

s(n) ■ 

Nisan and Wigderson showed that: 

Nisan- Wigderson Theorem 1. [NW94] There exist {£ i— s{£‘^)) pseudo- 
random generators against class of size circuits (for some size hound 

s and constant c > 0) if and only if there exist sets in E that are hard-to- 
approximate by circuits of size s{£‘^) (for some constant d> 0). 

In fact, the pseudo-random generator of [Nis9I] is constructed using the above 

theorem and the fact that there exists a set (e.g. PARITY [Has86]) that is hard- 

1 

to-approximate by circuits of size and depth d (the above theorem of 

Nisan and Wigderson holds in the presence of depth restriction too). 

An interesting special case is that of true pseudo-random generators, i.e., 
when s{£) = In that case, [NW94] showed that both the constants c and 

d can be set to one, and thus we get: 

Nisan- Wigderson Theorem 2. [NW94] There exist true pseudo-random 
generators against class of size 2^'^ circuits for some constant 0 < (5 < I 
if and only if there exist sets in E that are hard-to-approximate by circuits 
of size 2*^'^ for some constant 0 < e < I. 

One of the major implication of the existence of above true pseudo-random 
generators is that BPP = DP. In the following, we restrict our attention to true 
pseudo-random generators only as these have the most interesting implications. 
So, n = 2^^^l throughout the paper. 
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Although [NW94] provides evidence that true pseudo-random generators ex- 
ist, it is not clear that hard-to-approximate sets, as required, do exist in E. On 
the other hand, it is easier to believe that there exist sets in E that cannot be 
solved by subexponential size circuits — in other words, there is a set in E such 
that adv 2 e f(A) < 1 for some 0 < e < 1. Therefore, a major line of research in 
the last ten years has been to construct true pseudo-random generators from 
this weaker assumption. The approach taken was to start with a set A in E 
with adv 2 e (?(A) < 1, and derive another set B G E from A such that B is 
hard-to-approximate by 2*^ size circuits as required in the above theorem. 

The above aim was achieved in three steps. First, [BFNW93] constructed — 
starting from a set G E with adv25e^(A^) < 1 — a set A^ G E such that 
adv23ef (A^) < 1 — 1 ^. Then, in [Imp95], a third set A^ was constructed from A^ 
with adv 22 ef(A^) < |, and finally in [IW97] a set was constructed from A^ 
with adv 2 ef(A^) < thus achieving the desired generalization of the Nisan- 
Wigderson Theorem 2. In [STV99] two alternative constructions were given for 
the same result. 

The work in this paper is motivated by the following question: what is 
the hardness condition needed for constructing true pseudo-random generators 
against classes of circuits more restricted than the class of polynomial-sized cir- 
cuits (the class of circuits in the Nisan-Wigderson Theorem 2 is polynomial-sized 
in the generator output size, and exponential-sized in the generator input size)? 
A natural way of defining such circuits is by restricting their depth. So we can 
pose this question for several natural classes of small depth circuits, e.g., AC°, 
TC°, NC^, NC, etc. In analogy with the above result, we should perhaps expect 
that to construct pseudo-random generators against polynomial-sized circuits of 
depth d, we need a hard set against subexponential sized circuits of depth 0(d). 

We first observe that the constructions given in [BFNW93,Imp95,IW97] have 
the following property: starting with a set that is hard to compute by the class 
of circuits of size 2^*^^ and depth d, the constructed set is hard-to-approximate by 
circuits of size 2*^'^ and depth d — 0(1) (for some e, a > 0) provided the majority 
gate is allowed in the original class of circuits. This implies that for all circuit 
classes C that include TC°, one can construct true pseudo-random generators 
against C using a set in E that is hard to compute by subexponential sized 
circuits of the same depth (within a constant factor) as in C. 

Therefore, our question is answered for all the well-known circuit classes 
except for the class AC°. AC° circuits are polynomial-sized constant depth cir- 
cuits and it is known that they cannot compute the majority function [Has86]. 
Therefore, the construction of [BFNW93,Imp95,IW97] does not give the ex- 
pected result. Further, this seems to be a fundamental bottleneck as the other 
two constructions given in [STV99] also require at least threshold gates. So we 
have a intriguing situation here: even though there exist nearly true pseudo- 
random generators against AC° circuits (given by Nisan [Nis91]) that are un- 
conditional, we do not seem to get conditional true pseudo-random generators 
against AC® under a condition whose stronger forms give true pseudo-random 
generators against larger classes of circuits! It is useful to note here that true 
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pseudo-random generators against AC*^ circuits are interesting in their own right: 
their existence would imply that approximate DNF-counting can be derandom- 
ized [KL83], 

In this paper, we close this gap in our knowledge to show that: 

Theorem 1. There exist true pseudo-random generators against class of size 
2^'^ and depth 0{d) AC° circuits for some constant 0 < S < 1 if and only if 
there exist a set in E that cannot he computed by AC° circuits of size 2~*'^ and 
depth 0{d) for some constant 0 < 7 < 1. 

The idea is to exploit the unconditional pseudo-random generators of Nisan. 
The generator of Nisan stretches a seed of size (logn)'^^'^^ to n bits and works 
against depth d, size n AC° circuits. Moreover, every output bit of the gener- 
ator is simply a parity of a subset of seed bits. Now the crucial observation is 
that parity of poly (log n) bits can be computed by AC° circuits, and so if we 
compose the Nisan generator with any given circuit C of depth d and size n, 
we get another AC° circuit of a (slightly) larger depth and size that has only 
poly(logn) input bits (as opposed to n in C) and yet the circuit accepts roughly 
the same fraction of inputs as C. A careful observation of the constructions 
of [BFNW93,Imp95,NW94,NW94] yields that if the pseudo-random generator 
constructed through them needs to stretch a seed of ^ bits to only poly(f) bits 
(instead of 2*^'^ bits), then we need to start from a set in E that is hard to com- 
pute by circuits of size 2^ depth d that have majority gates over only poly(£) 
bits (instead of over bits). Such majority gates can be replaced by AC° 
circuits of size 2°^^^. Therefore, we only require sets in E that are hard to com- 
pute by size 2^'^ and depth d' AC° circuits! A minor drawback of the result is 
that the true pseudo-random generators that we obtain approximate the fraction 
of inputs accepted by a circuit C within poiyp^g as opposed to ^ in all the 
other cases. However, for many applications, e.g., derandomizing approximate 
DNF-counting, this weaker approximation is sufficient. 

The organization of the paper is as follows: in the next section we analyze 
the existing constructions and in Section 3 we give our construction. 

2 Depth Increase in Existing Constrnctions 

The construction in [BFNW93,Imp95,IW97,NW94] can be divided into five 
stages: 

Stage 1. Given a set A\ in E such that adv 2 eif(Ai) < 1, construct a function 
/ = {/^} E such that for any e/ < ei, and for any circuit C of size 
the fraction of inputs on which C can compute fi correctly is at most 1 — ^ . 
This construction was given in [BFNW93]. 

Stage 2. From the function / construct a set A 2 G E such that for any £2 < £/, 
adv 2 € 2 ^(A 2 ) <1— J3. This construction was given in [GL89]. 

Stage 3. From the set A 2 construct a set A 3 G E such that for any £3 < £ 2 , 
adv 2 € 3 ^(A 3 ) < This construction was given in [Imp95]. 
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Stage 4. From the set A 3 construct a set A 4 G F, such that for any £4 < £ 3 , 
adv2e4^ (A4) < This construction was given in [IW97]. 

Stage 5. using the set A 4 , construct a true pseudo-random generator G = {G„} 
with Gn ■ {0, i_). {Oj 1}" against circuits of size n. This, of course, 

was given in [NW94]. 

We now describe each of these constructions. The correctness of all the con- 
structions is shown using the contrapositive argument: given a circuit family 
that solves the constructed set (or function) with the specified advantage, we 
construct a circuit family that solves the original set (or function) with an ad- 
vantage that contradicts the hardness assumption about the set. For our pur- 
poses, the crucial part in these arguments would be the depth and size increase 
in the constructed circuit family over the given circuit family. We do not need 
to worry about the complexity of the constructing the new set from the original 
one — this is an important to keep in mind as often this complexity is very high 
(e.g., in Stage 1 and Stage 4). 

Several times in the constructions below, we make use of the following (folk- 
lore) fact about computing parity or majority of £ bits: 

Proposition 1 . The parity or majority of £ hits can he computed hy AC° circuits 
of size 0(2^ ‘‘) and depth d. 

Hastad [Has86] provided a (fairly tight) corresponding lower bound: 

Lemma 1 . The parity or majority of £ hits cannot he computed hy AC° circuits 
1 

of size 2^^^ and depth d. 

2.1 Stage 1: Analyzing Babai-Fortnow-Nisan-Wigderson’s 
Construction 

Construction of / Function / is an small degree, multi-variate polynomial 
extension of the set Ai over a suitable finite extension field of F2. More specifi- 
cally, function f{x), |a;| = £, is defined as follows (we assume ^ to be a power to 
two for convenience): 

Fix field F = Fp.. Let k = 2ioge ' Define polynomial P{yi,y2, ■ ■ ■ ,yk) 
over F as: 

k 

P{yi,V2,---,yk) = ■■■ Ai(uiW2---i’fc) (?/*), 

t>i:|t>i|=log£ Vk.\vk\=\ogt. i=l 

where 

r. / \ '^liv:\v\—log£Av^Vi^y'^ 

Kiy^) = fj— _ V 

ll.v:\v\—\ogiAv^Vi\'^'^ 

Let X = X 1 X 2 ■ ■ ■ Xk with \xi\ = 21og^. Then, 

f{x) = P{X1,X2, ■ ■ .,Xk). 

Polynomial P has k = 210 i variables and each variable has degree at most £. 
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Correctness of construction. Suppose that a family of circuits {Cg} of size 
exists such that for every Ci can correctly compute / on more than 1 — ^ 
fraction of inputs. We use this circuit family to construct a circuit family that 
correctly decides /, and therefore Ai, everywhere. 

Fix £ and x, |x| = £. String x can be viewed as a point in the fc-dimensional 
vector space over F12. Select a random line passing through x in this space. It is 
easily argued that with probability at least |, on such a line, Ce will correctly 
compute / on at least 1 — ^ fraction of points. Notice that when restricted to 
such a line, polynomial P reduces to a univariate polynomial P' of degree at 
most 2]^oge - Randomly select 2iogi + ^ points on this line and use circuit Ci to 
find out the value of / on these. Clearly, with probability at least 1 — the 
computed value of / would be correct on all the points. Interpolate polynomial P' 
using these values and then compute the value of f{x) using P' . The probability 
that f{x) is correctly computed is at least | • (1 — > |. Repeat the same 

computation with different random choices P times and take the value occurring 
maximum number of times as the value of f{x). The probability that this is 
wrong would be less than Finally, fix a setting of random bits that work for 
all 2 ^ different x’s. The circuit implementing this algorithm correctly computes 
/ everywhere (the circuit is non-uniform though). 

Let us now see what is the size and depth of this circuit, say C", as compared 
to Ci- Once all the random choices are fixed, C just needs to use Ci on jLgl + ^ 
different inputs (computed by xoring a fixed string to x), and then take a linear 
combination of the output values^. As there are outputs each of size t, 

this can be done by a AC° circuit of size 2 °^^^ . Thus the size of the circuit C is 
at most 2*^ A long as ei > e/ contradicting the assumption. Notice that depth 

of C is only a constant more than of Ci and C does not have any majority gate 
except those already present in Ci. 



2.2 Stage 2: Analyzing Goldreich-Levin’s Construction 

Construction of A 2 Set A2 is defined as: xr € A2 iff |a;| = \r\ = £ and 
f{x) • r = 1 where is the inner product. 



Correctness of the construction. Assume that a circuit family {Ci} of size 
2«2^ is given such that advc^(A2) > 1 — As we later need this result for 
smaller advantages too, we give the construction assuming that advc^(A2) > C- 
Fix £ and a;, |a;| = £. Define circuit C as follows: 

^ This linear combination is the degree zero coefficient of the interpolated polynomial 
P' . Notice that circuit C' does not need to interpolate P' (which actually may not 
be possible to do by subexponential sized constant depth circuit) since the points 
at which values of / are given are fixed (once the random choices are hxed) and 
therefore, the inverse of the corresponding van der Monde matrix can simply be 
hardwired into C' . 
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Let t = c-log£ for a suitable c > 1. Randomly choose t strings ri, . . rt 
with \n\ = t. For each non-empty subset J of {1 , . . . , t}, let rj = ©ig 
(these rj’s are pairwise independent and this is exploited in the proof). 
Fix s, |s| = t and compute aj = where Si is the bit of s. Now 

compute bit of f{x) as the majority of the 2* — 1 values (obtained by 
varying J) aj © C2i{x,rj © e*) where e* is an £-bit vector with only the 
bit one. Finally, output the guess for f{x) thus computed for each of 
the 2* — 1 values of s. 



It was shown in [GL89] that for at least fraction of inputs x, f{x) is 
present in the list of strings output by the circuit C with probability close to 
one. Now there are two ways to design a circuit C" that outputs f{x) depending 
on the value of C. If C = js^ then C” randomly picks one string output by 
C and outputs it. The probability that it succeeds is close to = jum - Fix 
the internal random bits used by this circuit by averaging. The resulting circuit 
correctly outputs f{x) on at least fraction of inputs. 

The second way is for C > |. In this case, C" selects the right string from the 
output list of C as follows (suggested in [Imp95]): randomly choose 0 {t) many 
strings r G {0, 1}^ and for each string u output by C test if m • r = C2i{x, r) and 
output the string u for which the largest number of r’s satisfy the test. It was 
shown in [Imp95] that suitably fixing random strings r, if f{x) appears in the 
output list then C" would certainly output it. Therefore, the fraction of inputs 
on which C” is correct is at least 

In either case, the depth of the circuit C" is only a constant more than of 
C2i- Although C” uses majority gates, they are only over many inputs and 
so can be replaced by constant depth subexponential AC° size circuits. 

Notice that the above two constructions cannot handle C between &nd 
o(l). However, these values of C are never required in the constructions^. 



2.3 Stage 3: Analyzing Impagliazzo’s Hard-Core Construction 

This stage has three substages. In the first substage, starting from set A2 with 
a,dv2>2f(A2) < I — set A' is constructed with adv2eO (A') < I — for any 
e' < €2- In the next stage, set A" is constructed from A' with adv2e'o(A") < 
I — for any e" < e'. And in the third substage, from A" , set A3 is constructed 
with adv2€3i(A^) < ^ for any £3 < e". 

All the three substages are identical. We describe only the first one. 



Construction of A' . Set A' is defined as: rs G A' iff |r| = c • f, |s| = 2f and 
r-g{s) = 1 where g{s) = A2{xi)A2{x2) ■ ■ ■ A2{xc-e) with xi, . . ., Xc-e, \xi\ = t, (for 
an appropriate constant c) generated from s in a pairwise-independent fashion — 
let s = S1S2 with I Si I = IS2I = f , then = Si • i + S2 in the field 

^ In fact there is a third way that works for all values of C,. However, it uses error- 
correcting codes and decoding these appears to require more than constant depth 
subexponential size circuits. So we cannot use it. 
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Correctness of the construction. Let a circuit family {Ci} of size 2*^ ^ be 
given such that advc^(A') > 1 — First invoke the (second) Goldreich-Levin 
construction to conclude that there exists a circuit family {C^} of size 2 ^ ^ for 
e' < S' < 62 that computes function g{s) on at least (1— > 1— gyj- fraction 

of inputs. Fix an £. Define a circuit C” as: 



On input x, |a;| = £, randomly select an 1 < z < c • f. Then randomly 
select first half Si of the seed s and let S 2 = x-|- Si • z (this ensures that x 
occurs as Xi). Use s to generate xi, . . Xc-t- Output the z*^ bit of C (.^2 
as guess for A 2 {x). 



It was shown in [Imp95] that, for any given set S C {0, 1}^ with [S'! > yip, 
when input x is randomly selected from S, the probability that C"{y) = A 2 {y) 
is at least 

From the circuit (7", construct another circuit C'” as: take copies of C” 
(using different random bits for each one), and take the majority of their output 
values. For any x, if the probability of C" incorrectly computing A 2 {x) is more 
than p then it must be that C" incorrectly computes A 2 {x) with probability 



more than |. By the above property of C", there cannot be more than y|p such 
x’s. Therefore, on at least 1 — y|p fraction of inputs, C'" computes A 2 correctly 
with probability at least 1 — p. Now fix the random bits of C'" such that the 
resulting circuit computes A 2 correctly on at least 1 — y|p fraction of inputs. 

As for the size and depth increase, circuit C'” (as well as the final circuit) 
uses one majority gate (on £'^ inputs) at the top and one bottom layer of parity 
gates (on £ inputs). It also uses £“^ copies of C" in parallel. Therefore, the size of 
the circuit is at most 2'''^^ since 62 > S' and depth is only a constant more. This 
contradicts the assumption about A 2 . 

The above construction of circuit C'" is used again later with different pa- 
rameters: starting with a circuit C" that computes the given set with probability 
at least ^ + e on any subset of strings of size 0(2^), we can use the above con- 
struction to obtain a circuit C'" that computes the set on a constant fraction 
of inputs in a similar fashion. This circuit is constructed by taking the major- 
ity of O(^) copies of another circuit. The value of e would be crucial in our 
calculations there. 



2.4 Stage 4: Analyzing Impagliazzo-Wigderson’s Construction 

Construction of A 4 . Set A 4 is defined as: rs G A 4 iff |r| = £, |s| = k£, and 
r • g'(s) = 1 where g'(s) = A^{x\)A^{x 2 ) ■ ■ ■ A^{xt) with XiS generated from s 
via a generator whose output is XOR of the outputs of an expander graph based 
generator and a NW-design based generator. 



Correctness of the construction. The construction is this stage is very sim- 
ilar to the one in previous stage. Let a circuit family {Ce} of size 2''^^ be given 
such that advCf(A 4 ) > Invoke the (first) Goldreich-Levin construction to 
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obtain a circuit family {C^} of size 2*^ ^ computing function g' on fraction of 
inputs for e' > €4. 

Fix and £. Construct a circuit C" in a similar fashion (although the analysis 
becomes different) that computes A3 with probability at least | (for any 

e" > e') on any given set of size |g and then construct C'" from C" by taking the 
majority of 0(2^'^ copies of C" . As before, it can be shown that C" computes 
A3 correctly with probability more than 1 — ^ on all but A fraction of inputs. 
Fixing random bits of C" suitably gives a circuit that correctly computes A3 
on at least ^ fraction of inputs. 

The size of circuit C'" is at most 2*^^^ since €3 > e" . The depth of C'" is still 
only a constant more than that of Ce since the output of the generator used 
in construction of A4 can be easily computed: the output of the generator gets 
fixed upon fixing the random bit values apart from £ fixed positions where the 
string X is written. 

However, the majority gate at the top of C'" has inputs. This cannot 
be done using AC° circuits in constant depth and 2*^" size for any 5 > 0. In fact, 
this is the only place where the depth condition is violated. 

2.5 Stage 5: Analyzing Nisan-Wigderson’s Construction 

Construction of generator. Pseudo-random generator G„ is defined as: given 
k ■ log n length seed s, compute n “nearly disjoint” subsets of bit positions in the 
seed of size t ■ logn each {t < k). Let the strings written in these positions be 
xi, . . ., Xn, |a;i| = t ■ logn. Output A4(xi)A4(a;2) • • • A4(x„). 

Correctness of the construction. Let C be a circuit of size n such that 

I Prob,,g{o.i}"{C'(a;) = 1} - prob,g{o.i}'= "{C'(G'n(s)) = 1} |> 

Define circuit C* as: 

On input Xi and A4(a:i) • • • A4(xi_i), randomly select a bit h and a string 
r of length n — i. Compute o = C(A4(a;i) • • • A4{xi-\)br). Output 6 © o. 

It was shown in [NW94] that for at least one i, C* correctly computes A4{xi) 
on at least \ + ^ fraction of inputs. 

Exploiting the property that the subsets of bit positions determining each 
of x\, . . ., Xi-i are nearly disjoint from those determining Xi, one can fix the 
random bits of C* and of the seed s except for those bits that determine Xj 
such that the advantage of C* in computing A4(xj) is preserved and the value of 
A4{xj) (for j < i) is needed by the circuit (as Xi varies) for at most n different 
inputs. So all values of A4 needed by the circuit (at most n^) can be hardwired 
into it, thus eliminating the need of providing A4(xi) • • • A4(xi_i) as part of the 
input. 

Let the final circuit be C" . The size of C” is 0 {n^) and advc//(A4) > 
on inputs of size t ■ log n. For a suitable choice of t and k, this contradicts the 
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hardness of A 4 . The depth of C" is only a constant more than the depth of 
C as the only additional computation needed is to select the correct hardwired 
values of A 4 {x\), . . A 4 {xi-\) depending on the input Xi (this is a simple table 
lookup) . 



2.6 Analyzing Constructions of Sudan- Trevisan-Vadhan 

The above bottleneck prompts us to look at other constructions of true pseudo- 
random generators present in the literature: there are two such constructions 
known given in [STV99]. However, both these constructions have similar bottle- 
necks. We point out these bottlenecks below: 

First construction. This construction uses a false entropy generator. This gen- 
erator makes use of the hard-core result of Impagliazzo [Imp95] . The value 
of e that the construction requires in the hard-core result is ■ So this 
has the same problem as the construction of [IW97]: it requires to compute 
the majority of bits. 

Second construction. This construction actually shows that stage 3 and 4 
above can be bypassed. In other words, the multivariate polynomial P has 
enough redundancy to directly ensure that no circuit family of size 2^^ can 
compute the function on more than ^ fraction of inputs. However, the proof 
for this result is far more involved than the proof of stage 1. In the proof, to 
interpolate the polynomial correctly on a random line, at least 2'^^ samples 
are needed. This requires, amongst other things, xoring of 2”^ bits and also 
computing 2”^th power of a given element in a field of size None of 

these can be performed by constant depth 2*^^^) sized AND-OR circuits. 

3 Proof of Theorem 1 

The problem in working with AC° circuits is that they are too weak to do even 
simple computations. But we can use this drawback to our advantage! Since good 
lower bounds for AC*^ circuits are known [Has86], one can construct uncondi- 
tional pseudo-random generator against such circuits. In [Nis9I], Nisan used 
lower bounds on parity function to obtain pseudo-random generators against 
depth d, size n AC° circuits that stretch seeds of size (logn)*^^'^^ to n bits. More- 
over, each output bit of these generators is simply parity of some of the seed 
bits. Therefore, each output bit of the generator can be computed by an AC° 
circuit of size and depth 0{d). 

So, given an AC° circuit C of depth d and size n that accepts S fraction of in- 
puts, when we combine this circuit with the pseudo-random generator of [Nis91], 
we get another AC° circuit of depth 0{d) and size 0{n^) that has only (logn)*^^'*^ 
input bits and still accepts S ± ^ fraction of inputs. Let us try to construct a 
true pseudo-random generator against such a circuit using the Nisan-Wigderson 
construction. This generator needs to stretch O(logn) bits to (logn)‘^^‘^) bits. If 
we examine the Nisan-Wigderson construction of the generator, it is apparent 
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that — if we fix the approximation error to instead of ^ — such a gen- 

erator can be constructed provided there exists a set A G E such that for any 
depth 0{d) circuit family {Ce} of size 2^^, advc^(A) < Now notice that 

such a set can be easily constructed by modifying the stage 4 of the construc- 
tion! Since instead of e = we now have e = the majority gate needed 
in the construction will have a fan-in of only and this can be done by 

constant depth AND-OR circuits of subexponential size^. Hence the overall con- 
struction now becomes six stage one: first three stages are identical to the ones 
described above; the fourth stage is modified for weaker approximation needed; 
the fifth stage uses Nisan-Wigderson construction for pseudo-random generator 
that stretches the seed only polynomially; this stretched seed acts as seed for 
the Nisan generator in the final stage that stretches the output to n bits. 

It is interesting to note that each output bit of this pseudo-random generator 
is simply an XOR of several bits of the characteristic function Ai: the multi- 
variate polynomial construction in Stage 1 is just an XOR of some input bits; 
Stage 2, 3, and 4 constructions are clearly simple XORs (computing which bits 
to XOR requires some effort though); the fifth stage merely copies some bits 
from input to output; and the last stage (it uses parity function) is also xoring 
some bits. 
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Abstract. For any class C und closed under NC^ reductions, it is shown 
that all sets complete for C under first-order (equivalently, Dlogtime- 
uniform AC°) reductions are isomorphic under first-order computable 
isomorphisms. 



1 Introduction 

One of the long-standing conjecture about the structure of complete sets is the 
isomorphism conjecture (proposed in [BH77]) stating that all sets complete for 
NP under polynomial-time reductions are polynomial time isomorphic. As the 
conjecture cannot be resolved either way unless we discover non-relativizable 
techniques (see [KMR88,KMR89,FFK92] for more details), efforts have been 
made to prove the conjecture in restricted settings by restricting the power of 
reductions (see for example [Agr96,AAR98]). One of the most natural definition 
of restricted reductions is that of functions computed by uniform constant- depth 
(or AC°) circuits (first studied in [CSV84]). These reductions provide the right 
notion of completeness for small complexity classes (logspace and below). Also, 
it has been observed that natural complete problems for various complexity 
classes remain complete under such reductions [IL95,Imm87]. Although the class 
of AC° functions is much smaller than the class of polynomial-time functions, 
it is interesting to note that, till recently, there was no known example of an 
NP-complete set that is not complete under uniform AC° reductions [AAI+97]. 

The notion of uniformity to be used with AC° circuits is widely accepted 
to be that of Dlogtime-uniformity (see Section 3 for definition). Under this 
uniformity condition, these circuits admit a number of different characteriza- 
tions [BIS90,AG91]: functions computed by first-order logic formulae [Lin92], 
0(l)-alternating log-time TMs [Sip83], logspace rudimentary predicates [Jon75] 
etc. 

The isomorphism conjecture for complete sets for NP under AC*^ reductions 
has been studied before. Allender et. al. [ABI93] showed that all sets complete 
under first-order projections (these are very simple functions computed by uni- 
form circuits with no gates [IL95]) are Dlogtime-uniform AC*^-isomorphic (i.e., 
the isomorphism between any two such sets is computable in both directions by 
Dlogtime-uniform AC° circuits). This was improved in [AAR98] who showed that 
all sets complete under u-uniform (for any u) AC° reductions are non-uniform 
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AC*^-isomorphic. Notice that this result proves the isomorphism conjecture for 
non-uniform AC*^ reductions but not for Dlogtime-uniform reductions. The uni- 
formity condition for isomorphisms was improved first in [AAI+QT] to P-uniform 
and then in [AgrOl] to logspace-uniform thus proving the isomorphism conjecture 
for P-uniform and logspace-uniform AC*^ reductions respectively. More specifi- 
cally, for all sets complete under u-uniform AC° reductions, [AAI+97] shows 
that they are (rt-l-P)-uniform AC°-isomorphic, while [AgrOl] shows that they are 
(rt-l-logspace)-uniform AC°-isomorphic. However, the conjecture remains open 
for Dlogtime-uniform AC*^ reductions, which is, in many ways, the correct for- 
mulation of the isomorphism conjecture for constant depth reductions. 

In this paper, we prove that all complete sets for NP under u-uniform AC*^ 
reductions are (u-l-Dlogtime)-uniform AC*^-isomorphic thus proving the isomor- 
phism conjecture for constant depth reductions. Since there are a number of al- 
ternative characterizations of Dlogtime-uniform AC*^ circuits, this theorem can 
be viewed in many interesting ways, e.g., all sets complete under first-order reduc- 
tions are first-order isomorphic (first-order functions are computed by first-order 
formulae). The above in fact holds for any class closed under TC° reductions. 

The next section provides an outline of our proof. Section 3 contains defini- 
tions, and the subsequent sections are devoted to proving the result. 

2 Proof Outline 

The overall structure of the proof remains as given in [AAR98]. The proof 
in [AAR98] is a three stage one: 

Stage 1 (Gap Theorem): This shows that all complete sets under rt-uniform 
AC° reductions are also complete under non-uniform NC*^ reductions. This 
step is non-uniform. 

Stage 2 (Superprojection Theorem): This proves that all complete sets un- 
der u-uniform NC*^ reductions are also complete (M-l-P)-uniform superprojec- 
tions, where superprojections are functions similar to projections. This step 
is P-uniform. 

Stage 3 (Isomorphism Construction): This proves that all complete sets 
under u-uniform superprojections are isomorphic under (u-|-Dlogtime)- 
uniform AC*^ isomorphisms. This step is Dlogtime-uniform: starting with 
Dlogtime-uniform superprojections, one gets Dlogtime-uniform AC° isomor- 
phisms. 

The proof of Gap Theorem uses the Switching Lemma of [FSS84] in the con- 
struction of NC° reductions and is the reason for its non-uniformity. In [AAI+97] 
the lemma was derandomized using method of conditional probabilities making 
the stage P-uniform. Improving upon this, in [AgrOl], the lemma was deran- 
domized by constructing an appropriate pseudo-random generator. This made 
the stage logspace-uniform. 

The Superprojection Theorem of [AAR98] uses the Sunflower Lemma 
of [ERGO] which is P-uniform. This construction was replaced in [AgrOl] by 
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a probabilistic construction that could be derandomized via an appropriate 
pseudo-random generator. This again resulted in a logspace-uniform construc- 
tion. 

Clearly, the uniformity of both these stages needs to be improved to obtain 
Dlogtime-uniformity. It is useful to note here that we need to make both the 
stages AC°-uniform only as that makes the isomorphism constructed by Stage 3 
also AC°-uniform and then the AC° circuit used in uniformity can be incorpo- 
rated in the AC° circuit for the isomorphism making the resulting AC*^ circuit 
Dlogtime-uniform. In fact this is the best that we can hope to do as it is known 
that the Gap Theorem cannot be made Dlogtime-uniform [AAR98]. 

We preserve the idea of [AgrOl] of first giving a probabilistic construction and 
then derandomizing it via an appropriate pseudo-random generator in both the 
stages. The improvement in the uniformity condition of first stage is achieved by 
a careful construction of the pseudo-random generator needed that allows it to 
become AC°-uniform. The second stage presents a bigger problem. We replace 
the probabilistic construction used in [AgrOl] by a more involved probabilistic 
construction and then derandomize it to obtain AC*^-uniformity. 

Combining the above constructions together with the Isomorphism Construc- 
tion, we get Dlogtime-uniform AC°-isomorphisms. 

3 Basic Definitions and Preliminaries 

We assume familiarity with the basic notions of many-one reducibility as pre- 
sented, for example, in [BDG88]. 

A circuit family is a set {C„ : n G N} where each C„ is an acyclic circuit 
with n Boolean inputs xi,. . . ,Xn (as well as the constants 0 and 1 allowed as 
inputs) and some number of output gates yi,. .. ,yr- {Cn} has size s(ji) if each 
circuit Cn has at most s(n) gates; it has depth d{n) if the length of the longest 
path from input to output in C„ is at most d{n). 

For a circuit family {Cn}, the connection set of the family is defined as: 

Connc = {(n, t, i,j) \ gate i in Cn is of type t and takes input from gate j}. 

A family {C„} is u -uniform if the connection set can be computed by a machine 
(or circuit) with a resource bound of u. In this paper, we will consider two 
notions of uniformity: Dlogtime-uniformity [BIS90] and AC*^-uniformity. In the 
first, the connection set is computed by a TM with random access tapes working 
in O(logn) time (which is linear time as a function of input size), and in the 
second, the connection set is computed by an AC° circuit of polynomial size 
(which is exponential size in terms of input size). We will follow the standard 
convention that whenever the connection set is computed by a circuit family, 
the circuit family is assumed to be Dlogtime-uniform. So, for example, AC°- 
uniform means that the set can be computed by a Dlogtime-uniform AC° family 
of circuits. 

A function / is said to be in AC° if there is a circuit family {Cn} of size 
and depth 0(1) consisting of unbounded fan-in AND and OR and NOT gates 
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such that for each input x of length n, the output of C„ on input x is f{x). We 
will adopt the following specific convention for interpreting the output of such a 
circuit: each Cn will have + k log(n) output bits (for some k) . The last k log n 
output bits will be viewed as a binary number r, and the output produced by 
the circuit will be binary string contained in the first r output bits. It is easy to 
verify that this convention is AC*^-equivalent to any other reasonable convention 
that allows for variable sized output, and for us it has the advantage that only 
O(logn) output bits are used to encode the length. 

With this definition, the class of Dlogtime-uniform AC°-computable func- 
tions admits many alternative characterizations, including expressibility in first- 
order with {+, x,<}, [Lin92,BIS90] the logspace-rudimentary reductions of 
Jones [Jon75,AG91], logarithmic-time alternating Turing machines with 0(1) 
alternations [BIS90] and others. This lends additional weight to our choice of 
this definition. 

NC° is the class of functions computed in this way by circuit families of 
size and depth 0(1), consisting of fan-in two AND and OR and NOT 

gates. Note that for any NC° circuit family, there is some constant c such that 
each output bit depends on at most c different input bits. An NC° function is 
a projection if its circuit family contains no AND or OR gates. For the sake of 
simplicity, we assume that NC° and projection functions do not have variable 
sized output. This may seem restrictive at a first glance, however, as we show 
later, that at least for complete sets we can ensure this property. 

For a complexity class C, a C-isomorphism is a bijection / such that both / 
and f~^ are in C. Since only many-one reductions are considered in this paper, 
a “C-reduction” is simply a function in C. 

(A language is in a complexity class C if its characteristic function is in C. 
This convention allows us to avoid introducing additional notation such as FAC°, 
FNC^, etc. to distinguish between classes of languages and classes of functions.) 

4 AC°-Uniform Gap Theorem 

In this section, we prove the AC°-uniform version of the Gap Theorem of [A AR98] : 

Theorem 1. For any class C closed under reductions, all complete sets for 
C under u-uniform AG° reductions are also complete under (u + ) -uniform 

NG° reductions. 

Proof. We begin by outlining the proof in [AAR98] and improvements of [AgrOl] 
as we make use of both of them. 

Fix a set A in C that is complete under rt-uniform AG® reductions and let 
R G C be an arbitrary set. We need to show that B reduces to A via a (u-l-AG®)- 
uniform NG® reduction. We first define a set B, which is a highly redundant 
version of R, as accepted by the following procedure: 

On input y, let y = l^Oz. Reject if k does not divide \z\. Otherwise, 

break 0 into blocks of k consecutive bits each. Let these be U 1 U 2 U 3 ■ ■ ■ Uq. 
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For each z, 1 < i < g, let Vi be the parity of bits in Ui. Accept iff 
V1V2 ■ ■ -Vq G B. 

As one can readily observe, corresponding to a string in B there are infinitely 
many strings in B. Also, B reduces to B via an NC^ reduction and so B G C. 
Fix a reduction of i? to A given by u-uniform AC*^ circuit family {Cn}, say. Now 
define a reduction of B to B as follows (it would be useful to keep the above 
definition of B in mind while reading this definition): 

Given an input a;, |a:| = n, let m = n‘ for an appropriate constant t to 
be fixed later. Consider the circuit Cm/n+i+m with the first m/n + 1 bits 
set to l™/"0 resulting in circuit C'^, say. Apply the Switching Lemma 
of [FSS84] on to obtain a setting of all but f2(n • (logn)^) input bits 
such that the circuit reduces to an NC° circuit and in addition, all the n 
blocks of m/n = consecutive bits in the input have at least (logn)^ 
unset bits (it was shown in [AAR98] that this can be ensured and this 
is what governs the choice of constant t). Now set all those unset bits 
to zero that infiuence at least one of the last k ■ log n bits of the output 
(remember that these bits encode the length of the output as per our 
convention). This sets O(logn) additional unset bits. Since each block 
had (logn)^ unset bits to begin with, each block would still have at least 
two unset bits. Now for each of the n blocks, set all but one bits of the 
block to ensure that the number of ones in the block is 0 modulo 2 (this 
can also always be done as there is at least one unset bits available for 
setting). This sets all the m bits of input to C'^ except for n bits and 
on these n unset bits the circuit C'^ becomes an NC° circuit. Now map 
X to a string of length mln+ 1 + to whose first m/n+1 bits are set to 
fm/nQ and the remaining bits are set according to the above procedure 
and the remaining unset bit is given the value of bit of x. 

It is easy to verify that the mapping constructed above is indeed a reduction 
of B to B. Notice that this reduction is simply a projection: each input bit 
is mapped to some output bit directly and there are no gates in the circuit 
computing the reduction. It is also clear that a composition of this reduction 
with the reduction of R to A is a reduction of i? to A that can be computed by 
an NC° circuit family. The uniformity machine (or circuit) for this NC° circuit 
family is required to do the following tasks, apart from generating the circuit 
C'^ itself: 

1. identity the settings of input bits to circuit that make the circuit an NC° 
circuit, 

2. given such a setting, transform the circuit C'^ to the equivalent NC° circuit, 
and 

3. set some of the unset bits as outlined above to leave only one unset bit in 
each block (in which string x would be placed). 

The second task can be done by a Dlogtime-uniform AC° circuit that, for each 
output bit of the circuit C'^, guesses the 0(1) input bits infiuencing the cor- 
responding NC° circuit and then verifies this guess by evaluating on all 
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possible settings of these bits and noting if the chosen output bit of C'^ becomes 
constant for each setting or not. 

For the third task, a Dlogtime-uniform AC*^ circuit can identify which unset 
bits influence the output bits coding length of the output, however, to set bits 
in a block appropriately (so that number of ones is 0 modulo 2), one requires a 
parity gate making the overall circuit an NC^ circuit. 

For the first task, we first note that according to the Switching Lemma 
of [FSS84] most of the settings work. In [AAI+97], a polynomial-time algorithm 
was given to identify one such setting given the circuit C'^ thus making the NC° 
circuit P-uniform. In [AgrOl], a pseudo-random generator was constructed that 
stretches a seed of 0(log n) bits to m bits such that on most of the strings output 
by the generator the circuit reduces to an NC° circuit. Using this, a unifor- 
mity machine can be constructed that first generates all possible outputs 

of the generator and then, in parallel, checks which one of these is “good” by at- 
tempting to transform to an NC° circuit as outlined above. The power of the 
machine is decided by the difficulty of computing the generator. In [AgrOI], the 
generator designed can be computed in logspace making the entire construction 
logspace-uniform. 

To obtain AC°-uniformity, we need to improve upon both the first and third 
tasks. For the first task, one can try to obtain a generator that is computable 
by a Dlogtime-uniform AC° circuit. However, improving the third one seems 
impossible at the first glance as it is well known that computing parity of n 
bits cannot be done by even non-uniform AC° circuits [FSS84]. We solve this 
problem be a clever design of the generator: the generator would be such that 
it associates a sign (0 or 1) with each unset bit and the parity of all the set bits 
and signs of unset bits in a block is always zero! This trivializes the third task. 
The reduction of H to B has to be changed slightly to make this work: map the 
bit of X to the unset bit if its sign is 0, else map it to the unset bit by 
first complementing it. 

We now give the generator construction of [AgrOl] and then show how to 
improve it so that both the first and third tasks are completed. The generator is 
a combination of two types of primitive generators: (1) generators that produce 
bits that are „o\i) -biased, 0(logn)-wise independent [NN90], and (2) generators, 
based on Nisan-Wigderson designs [NW94]. The generator consists of i primitive 
generators of each type where £ is a constant dependent on the depth and size 
of the circuit C'^. Also, each one of these primitive generators requires seed of 
length O(logn), and therefore, the seed length of the generator is O(logn). The 
generator is constructed as follows: 

Let G/ 7 V£)) ■ • Gjnd ^ primitive generators producing bits that are 
„o\i) -biased, 0(logn)-wise independent, and G]^^r, ■ ■ -, primi- 

tive generators based on NW-designs (their construction will be discussed 
later). The bit of the generator G of [AgrOl] is computed as: write 
i in binary and let i = iii 2 ■ ■ ■ it+i where \ij\ = ^ for 1 < j < ^ (we 
assume |i| to be a power of two to avoid complications). Compute bit 
■ ■ ■ *£+i] (we use G[k] to denote the bit of function G) 
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for 1 < j < ^. Let jo be the first j for which ■ ■ ■ ii+i] is zero. 

If there exists such a jo then let G[i] = ©i<j<jo^/Ar_D[*i*i+i ’ ’ ’ *^+i]- 
If there is no such jo then leave G[i\ unset and compute its sign as 



The following lemma was proved for this generator in [AgrOl]^: 

Lemma 1. [AgrOl] Let G he any AC° circuit of a depth and size hounded hy 
G'j^ having m input hits. Then on at least half of the outputs of G, G reduces to 
an NC° circuit. 



It is clear from the construction of G above, that the computational resources 
required for G depend on the resources required to compute the two types of 
primitive generators. In particular, if both these types of primitive generators 
can be computed by Dlogtime-uniform AC*^ circuits, the generator G can also 
be computed by such circuits. 

Let us now see constructions for these primitive generators. Three simple 
constructions of ^o\i) -biased, 0(logn)-wise independent generators are given 
in [AGHP90]. We choose one based on quadratic residues in a small field 
(in [AgrOl] a different generator is used): bit of is 1 iff the number 

s-l -|- z is a quadratic non-residue in the field Fp with prime p = (here s^ is 

the seed). This can be done by a Dlogtime-uniform AC° circuit: first an appro- 
priate prime p is computed (fixing the field Fp), then s^ is added to i (addition 
is modulo p), and finally it is checked if there exists an x such that x'^ = s^ + i. 
All these computations can be done by Dlogtime-uniform AC*^ circuits as the 
field size is small (as shown in [BIS90]). 

The generator G)yw is defined as: let z = ziZ 2 with jzij = |z 2 | = fc; let seed 
s-i = sis 2 ’’’Sc with [sj] = js^l = ••• = |Scl = ^ appropriate constant c; 
compute z' = YTe=i ■ (* 2 )®”^ where all the operations are over field F 2 k, and 
set the z*^ bit to 1 iff z' = zi. Again, all the computations here can be done by 
a Dlogtime-uniform AC° circuit (shown in [BIS90]). 

Thus the generator G can be computed by a Dlogtime-uniform AC° circuit. 
The primitive generator sets exactly one bit to 1 in consecutive blocks of 

m 2 bits, the generator G%^r sets exactly one bit to 1 in consecutive blocks of 



m'l unset bits remaining, etc. Thus the generator G leaves exactly unset one bit 



1,1 I 

in consecutive blocks of m^ ^ 



"i? = rn 2 ? bits which makes a total of 



unset bits. Now choosing t = 2^+^ (recall that m = zz‘) ensures that each of the 
n blocks contains exactly n > (logrz)^ unset bits as required. 

However, this generator does not guarantee that the parity of all the set 
bits and signs of the unset bits in a block is zero (the length of a block is 
mfn = ^ “^). To achieve this, we change the definition of generators G^^^j in 

such a way that parity of all the bits that it contributes to setting of bits and signs 



^ Actually, in [AgrOl] the generator construction is slightly different: G[i] is simply set 
to ■ ■ ■ ie+i] when jo exists. However, the lemma holds for this modification 

too, and this modification makes our current construction simpler. 
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in a block is zero. Notice that the generator contributes exactly ^ ^ 

bits to a block. Change the generator by replacing every (/ • logn)*^ bit 

by the parity of previous /-logn — 1 bits for a large enough constant / (/ should 
be chosen so that it is power of two and the generator is required to be /' • log n- 
wise independent for /'</). Without loss of generality, we can assume that n 
is of the form 2 ^ and then, since / is a power of two, / • log n would divide n. 
This ensures that the parity of all the bits contributed by the modified generator 
to a block is zero. However, we now need to show that the modified generator 
is still ^o{i) -biased, 0(logn)-wise independent. This follows immediately from 
the fact that any set of < / • log n bits of the modified generator is still ^o\i) ~ 
biased (follows from [AGHP90]), and therefore these bits are independent with 
a similar bias (shown in [NN90]). Thus with these modified primitive generators, 
the generator G satisfies all the required conditions proving the theorem. □ 

5 AC°-Uniform Superprojection Theorem 

We start with the definition of a superprojection [AAR98] . 

Definition 1 . An NCP reduction {C„} is a superprojection if the circuit that 
results by deleting zero or more of the output bits in each G„ is a projection 
wherein each input bit (or its negation) is mapped to some output. 

Now we prove the AC°-uniform Superprojection Theorem: 

Theorem 2. For any class C closed under reductions, all complete sets for 
C under u-uniform reductions are also complete under (u + AC^ ) -uniform 
superprojections. 

Proof. Fix a set A in C that is complete under u-uniform NC*^ reductions and let 
H G C be an arbitrary set. We need to show that B reduces to A via a (u-|-AC°)- 
uniform superprojection. We first define, as before, a set B as accepted by the 
following procedure: 

On input y let y = z'llz such that z' G {00, 01, 10}*. Break z' into pairs 
of bits. Ignoring all the 00 pairs, consider the first log|z| pairs. Define 
number k by setting i*^ bit of fc to 1 if the i*^ of the above log \z\ pairs 
is 10, to 0 otherwise. Reject if k does not divide \z\. Else, break z into 
blocks of k consecutive bits each. Reject if the number of blocks is not 
a multiple of four. Else, let z = U 1 U 2 U 3 ■ ■ -U 4 q with |Mi| = k. Let Vi be 
the parity of bits in Ui. Let Wi = V 4 i- 3 V 4 i- 2 V 4 i-iV 4 i for 1 < i < q (so 
each Wi is a four bit string). If Wi = 1111 for any I < i < q, accept. Else 
if some Wi has exactly three ones, reject. Else, for each i, 1 < i < q, let 
bi = 1 if Wi has exactly two ones, = 0 if Wi has exactly one one, bi = e 
otherwise. Accept iff 6162 •• A, G R. 

The definition of set B is more complicated that the previous one. Even the 
block size (= k) is coded in the string in a non-straightforward way. We refer to 
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the bits of z' of any instance y of i? as length encoder bits and to the bits of z as 
string encoder bits. It is easy to see that B reduces to B via an NC^ reduction 
and so B G C. Fix a reduction of i? to A given by u-uniform NC*^ circuit family 
{Cn}, say. Let each output bit of any circuit Cn depend on at most c input bits. 

As before, we now define a reduction of B to i?. The idea is same: for an 
appropriate m and £, consider the circuit Ci+ 2 +m- Set some of the input bits 
of Ci+ 2 +m so that the circuit on remaining unset bits is a superprojection. 
Now set some more bits (including all of length encoder bits) to satisfy all 
the conditions in the definition of set B and finally map string x to remaining 
unset bit positions. In [AgrOI], a simple random construction for this was given 
and then the construction was derandomized using an appropriate generator. 
However, the random construction did not guarantee that in every block at least 
one unset bit would be present after all the settings (that is why we need to have 
“empty” blocks for which bt = e). This makes the mapping of bits of x difficult 
as we need to use threshold gates to find the i*^ unset bit. This was not the case 
in the previous proof as every block there had an unset bit and so the i*^ unset 
bit can be identified by using an AC° circuit on the bits of the block. 

We give a different construction to solve this problem. Interestingly, our con- 
struction uses the central idea of the Switching Lemma proof of [FSS84] which 
is also used in the construction in Gap Theorem. 

We first discuss a simple idea (one that is used in [AgrOl]) and see why it 
does not work. 

Consider circuit C(,+ 2 +m- Randomly set every input bit of the circuit to 
0 or 1 with probability | each leaving it unset with probability Say 
that an input bit in string encoder part is good if it remains unset and 
there is at least one output bit that now depends only on this bit. For 
any input bit that influences some output bit in Ct.+ 2 +m, the probability 
that this bit is good is at least | > Y- Therefore, the expected 

number of good input bits is Q(m') where m' is the number of input bits 
in the string encoder part of C(,+ 2 +m that influence at least one output 
bit. Identify all the good bits and set all the other unset input bits 
appropriately. This makes the circuit C^+ 2 +m on the remaining unset 
bits a superprojection. 

The above construction yields fl{m) good bits provided we can ensure that 
nearly all the input bits influence the output (part of the complexity in definition 
of B is due to this requirement). The construction can easily be derandomized 
by using a 2c-wise independent generator for selecting unset bits and setting 
remaining bits. However, the problem pointed out earlier — it cannot be ensured 
that every block has at least one unset bit — remains. 

In our construction, we do use the above construction, but only after making 
sure that every block is guaranteed to have at least one unset bit. For this, we 
successively shrink the block size simultaneously making “bad” blocks (i.e., those 
that do not have unset bits) “empty.” This is why we cannot fix the block size 
in the beginning of the construction unlike the previous proof. 
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Let X, \x\ = n, be an instance of B. Let m = Consider the circuit 

C'4<‘<= iogm+2+m- To begin with, set the bit numbers 4^°logm-|-l and log m -h 2 
of the input to C44C i^g m+2+m to 1 identifying the first 4^'^ log m bits as length 
encoder and the last m bits as string encoder bits. Let C be the resulting circuit. 

Split the string encoder bits of the input to C into 4n block of equal size 
(= n ■ (4n^)'^“^). Firstly, we notice that every hit in every block must influence 
some output bit. Suppose not. Let such a bit belong to (4i + j) block. Set all 
the bits in all the blocks except for block numbers 4z -|- 1 through 4z -|- 4 so that 
the parity of bits in every block is zero. Set bits in blocks 4z -|- 1 through 4z -|- 4 
except those in block 4z -|- j such that parity of bits in these blocks is one. Set 
all the bits in the block 4i + j except the the bit that does not influence any 
output bit so that the parity of set bits is zero. This fixes the output of circuit 
C. However, the value of the lone unset bit decides whether the input string 
belongs to the set B or not, contradicting the fact that family {€„} computes a 
reduction of B to A. 

Apply a random restriction to input bits of C as outlined above using a ^ 2 - 
biased, 4^'^ log n-wise independent source to generate the restriction (instead of a 
2c- wise independent generator — the reason for this would be clear soon). There 
are two cases that can arise now: 

Case 1. For every seed value of the generator, there is at least one block with 
no good bit. 

Case 2. There is a seed of the generator that leaves at least one good bit in 
every block. 

We tackle Case 1 first. Undo the above random restriction. Divide each block 
into n sub-blocks of equal size (= (4n^)'^“^). For each sub-block, do the following 
experiment: set all other bits in all the other sub-blocks and blocks to zero, and 
then see if by setting an additional 4^° log n length encoder bits to zero all the 
output bits of the circuit C now depend only on at most c — 1 unset bits. We 
show later that there must exist such a sub-block. Fix any such sub-block. Set all 
bits in all other sub-blocks to zero and also those length encoder bits identified 
for this sub-block resulting in a circuit whose every output bit depends only 
on at most c — 1 input bits. For each length encoder bit set to zero, set its 
paired bit also to zero (rendering these pairs ineffectual for encoding length). 
We now are left with exactly (4n^)°“^ unset string encoder bits and at least 
log n — 2 ■ 4^'^ log n unset length encoder bits. 

Apply the random restriction on these bits and repeat the same process. If 
Case 1 keeps occurring, after c — 1 iterations, we would be left with exactly 4n^ 
unset string encoder bits and at least (4^°— 2(c— l)-4^'^)-log n > 2c-log2n = logm 
length encoder bits. And the circuit C is simply a projection on these unset bits! 
We can now fix the length encoder bits to code the block length as 1, set all the 
remaining length encoder bits to zero, set all but first 4n of unset string encoder 
bits to zero, set last three bits in every group of 4 unset bits to 100 and map the 
bit of X to the first bit of z*^ group. This defines a projection reduction of B 
to B and on outputs of this reduction, circuit C is also a projection. Therefore, 
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their composition is a projection (we shall see later that this composition can be 
computed by an AC*^-uniform circuit). 

The other possibility is that after some iterations, Case 2 occurs. In that 
case, identify the seed on which the generator output leaves at least one good 
bit in each block. Set the length encoder bits to code the current block length 
(we argue later that this can always be done). In every block, set all the bits 
except one good bit to zero. Now map the string x to these good bits as above. 
Use the modified generator of previous proof so that the parity of all the 

set bits plus the signs in each block is zero (to incorporate the sign, we just 
need to xor it with the settings of all the unset bits). This defines a projection 
reduction of i? to B whose composition with (7 is a superprojection. 

There are two things remaining to be done: (1) we need to show that when 
Case 1 occurs then there exists a sub-block with the desired properties and when 
Case 2 occurs then there are enough unset length encoder bit pairs are available, 
(2) we need to show that the above construction can be done by a Dlogtime- 
uniform AC*^ circuit. The second is easy to show: we have already seen that 
the generator output can be computed by Dlogtime-uniform AC*^ circuit. The 
remaining tasks can easily be done be a Dlogtime-uniform AC° circuit. 

To show the first, we make use of the central idea in [FSS84]. For block, 
let Oi, ■ . Op be all the output bits of C that depend on some bit in the block. 
For output bit Oj, let A be the set of input bits that influence Oi. Clearly, |/j| < c. 
On a random restriction as defined above, the probability that a bit in block 
belonging to A becomes good due to A is at least Let MaxSet be any maximal 
set of disjoint As. If |MaxSet| > 4^°logn then drop some As from it to make 
|MaxSet| = 4^'^logn. Since the restriction bits are 4^'^ log n- wise independent 
(with a small bias, of course) and MaxSet contains at most c-4^° log n < 4^° log n 
bits, the probability that at least one of the bits in the block belonging to 
some li in MaxSet becomes good is at least 1 — (1— ^)^ 

If each one of 4n blocks has this property, then the probability that each one of 
them has at least one good bit is at least The same calculation works when 
we drop from the set MaxSet those As that contain a bit from the first logm 
unset pairs of bits of length encoder bits (we drop at most 4c log 2n As). This is 
the Case 2: we can keep sufficient number of length encoder bit pairs unset and 
still have every block having at least one good bit. 

Now consider the other possibility: there is a block with | MaxSet | < 4^°logn. 
Divide this block into n sub-blocks of equal size as described above. Clearly, one 
of these sub-blocks will contain no bit that belongs to MaxSet, since MaxSet has 
less than c • 4^'^logn bits. Fix a sub-block that does not intersect with MaxSet. 
Now if we set all the bits of all the other blocks and sub-blocks and also at most 
c- 4^°logn < 4^°logn length encoder bits (the ones that belong to MaxSet), all 
the bits in MaxSet would be set and this would mean that each A contains at 
most c — 1 unset bits. This is Case 1! □ 




The First-Order Isomorphism Theorem 



81 



References 



AAI+97. 

AAR98. 

ABI93. 

AG91. 

AGHP90 



Agr96. 

AgrOl. 

BDG88. 

BH77. 

BIS90. 

GSV84. 

ER60. 

FFK92. 

FSS84. 

IL95. 

Imm87. 

Jon75. 

KMR88. 

KMR89. 

Lin92. 



M. Agrawal, E. Allender, R. Impagliazzio, T. Pitassi, and S. Rudich. Reduc- 
ing the complexity of reductions. In Proceedings of Annual ACM Symposium 
on the Theory of Computing, pages 730-738, 1997. 

M. Agrawal, E. Allender, and S. Rudich. Reductions in circnit complexity: 
An isomorphism theorem and a gap theorem. J. Comput. Sys. Sci., 57:127- 
143, 1998. 

E. Allender, J. Balcazar, and N. Immerman. A first-order isomorphism the- 
orem. In Proceedings of the Symposium on Theoretical Aspects of Computer 
Science, 1993. 

E. Allender and V. Gore. Rndimentary reductions revisited. Information 
Processing Letters, 40:89-95, 1991. 

N. Alon, O. Goldreich, J. Hastad, and R. Peralta. Simple constrnctions of al- 
most fc-wise independent random variables. In Proceedings of Annual IEEE 
Symposium on Foundations of Computer Science, pages 544-553, 1990. 

M. Agrawal. On the isomorphism problem for weak reducibilities. J. Com- 
put. Sys. Sci., 53(2):267-282, 1996. 

M. Agrawal. Towards uniform AC° isomorphisms. In Proceedings of the 
Conference on Computational Complexity, 2001. to be presented. 

J. Balcazar, J. Dfaz, and J. Gabarro. Structural Complexity I. EATGS 
Monographs on Theoretical Gompnter Science. Springer- Verlag, 1988. 

L. Berman and J. Hartmanis. On isomorphism and density of NP and other 
complete sets. SIAM .Journal on Computing, 1:305-322, 1977. 

D. Barrington, N. Immerman, and H. Stranbing. On uniformity within NG*^. 
J. Comput. Sys. Sci., 74:274-306, 1990. 

A. Ghandra, L. Stockmeyer, and U. Vishkin. Gonstant depth reducibility. 
SIAM Journal on Computing, 13:423-439, 1984. 

P. Erdos and R. Rado. Intersection theorems for systems of sets. J. London 
Math. Soc., 35:85-90, 1960. 

S. Fenner, L. Fortnow, and S. Kurtz. The isomorphism conjecture holds 
relative to an oracle. In Proceedings of Annual IEEE Symposium on Foun- 
dations of Computer Science, pages 30-39, 1992. To appear in SIAM J. 
Gomput. 

M. Furst, J. Saxe, and M. Sipser. Parity, circuits, and the polynomial hier- 
archy. Mathematical Systems Theory, 17:13-27, 1984. 

N. Immerman and S. Landau. The complexity of iterated multiplication. 
Information and Computation, 116:103-116, 1995. 

N. Immerman. Languages that capture complexity classes. SIAM Journal 
on Computing, 16:760-778, 1987. 

N. Jones. Space-bounded reducibility among combinatorial problems. J. 
Comput. Sys. Sci., 11:68-85, 1975. 

S. Kurtz, S. Mahaney, and J. Royer. The structure of complete degrees. 
In A. Selman, editor. Complexity Theory Retrospective, pages 108-146. 
Springer- Verlag, 1988. 

S. Kurtz, S. Mahaney, and J. Royer. The isomorphism conjecture fails 
relative to a random oracle. In Proceedings of Annual ACM Symposium on 
the Theory of Computing, pages 157-166, 1989. 

S. Lindell. A purely logical characterization of circuit complexity. In Pro- 
ceedings of the Structure in Complexity Theory Conference, pages 185-192, 
1992. 




82 



Manindra Agrawal 



NN90. J. Naor and M. Naor. Small-bias probability spaces: Efficient constructions 
and applications. In Proceedings of Annual ACM Symposium on the Theory 
of Computing, pages 213-223, 1990. 

NW94. N. Nisan and A. Wigderson. Hardness vs. randomness. J. Comput. Sys. 
Sci., 49(2):149-167, 1994. 

Sip83. M. Sipser. Borel sets and circuit complexity. In Proceedings of Annual ACM 
Symposium on the Theory of Computing, pages 61-69, 1983. 




Thresholds 

and Optimal Binary Comparison Search Trees 

(Extended Abstract) 

Richard Anderson^, Sampath Kannan^’*, 

Howard Karloff^’**, and Richard E. Ladner^’*** 

^ Department of Computer Science and Engineering, 

Box 352350, University of Washington, Seattle, WA 98195. 
EuidersonScs . Washington . edu . 

^ AT&T Labs-Research and Department of CIS, 

University of Pennsylvania, Philadelphia, PA 19104. 
kannanScis .upenn. edu. 

® AT&T Labs-Research, Room C231, 

180 Park Ave., Florham Park, NJ 07932, 
and College of Computing, Georgia Institute of Technology, 801 
Atlantic Dr., Atlanta, GA 30332-0280. 
howardSresearch. att . com. 

^ Department of Computer Science and Engineering, 

Box 352350, University of Washington, Seattle, WA 98195. 
ladnerOcs . Washington . edu . 



Abstract. We present an 0(n'*)-time algorithm for the following prob- 
lem: Given a set of items with known access frequencies, find the optimal 
binary search tree under the realistic assumption that each comparison 
can only result in a two-way decision: either an equality comparison or 
a less-than comparison. This improves the best known result of 0(n®) 
time, which is based on split tree algorithms. Our algorithm relies on 
establishing thresholds on the frequency of an item that can occur as an 
equality comparison at the root of an optimal tree. 



1 Introduction 

The binary search tree (BST) is one of the classic data structures in computer 
science. One of the fundamental problems in this area is how to build an optimal 
binary search tree where the items stored in the tree have some observed fre- 
quencies of access. In addition, there may be failure frequencies for unsuccessful 
searches. In the traditional problem each node in the binary search tree is la- 
beled with an item a. In a search for an item x, when the node labeled with a is 
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encountered there are three possible outcomes: x < a, x = a, and x > a. In the 
first case the search proceeds to the left subtree, in the second case the search 
ends at the node, and in the third case the search proceeds to the right subtree. 
A disadvantage of this three-way branching is that it takes two computer in- 
structions to determine the outcome. Knuth [9, Problem 33, page 457] suggests 
that it would be interesting to explore an alternative binary tree structure where 
nodes are labeled with either an equality comparison or a less-than comparison. 
The resulting tree has two-way branching instead of three-way branching. We 
call such trees binary comparison search trees or BCSTs for short. 

A very simple example demonstrates the benefit of having equality compar- 
isons. Consider the example of three items with weights (0, 1, 0). The optimum 
with equality has cost 1 while the optimum without equality has cost 2. (We can 
replace the O’s by arbitrarily small e’s to make this example less pathological.) 

Several years ago, Spuler [13] exhibited an 0(n^)-time algorithm to find the 
optimal BCST. His algorithm was based on earlier algorithms to find an optimal 
split tree [5,7,11]. (One node in a split tree [12] has one less-than comparison and 
one equality comparison, possibly on different items, leading to 3-way branch- 
ing.) 

In this paper we revisit the problem of finding the optimal BCST in the 
case of just success frequencies. Using a new approach we show that the optimal 
BCST can be computed in time O(n^). The algorithm’s correctness depends on 
our result that if all the frequencies are less than one fourth of the sum of the 
frequencies then the root of an optimal BCST cannot be an equality comparison. 

1.1 Practical Motivation 

One motivation for our study comes from an increased interest in high perfor- 
mance method dispatching that is required for object-oriented programs. Cham- 
bers and Chen [2] describe a dispatch tree, with both equality and less-than com- 
parisons, to efficiently look up methods in object-oriented programs. A method 
is always found so there is no chance of failure. Chambers and Chen employ an 
interesting and effective heuristic to find a good dispatch tree. They left open 
the question of how to find an optimal dispatch tree. Dispatch trees are actually 
slightly more general than our BCST’s. We focus on the more restricted problem 
of finding an optimal BCST. 



1.2 Traditional Binary Search Trees 

There are several efficient algorithms for finding optimal BST’s. A standard 
0(n^)-time dynamic program can be used, but the time can be improved to 
O(n^) by using a clever technique [9]. 

In a BCST, after a few equality comparisons are done, the resulting sub- 
problem, with its “holes” due to failed equality comparisons, corresponds to 
anything but an interval {i, i + 1, i + 2, ..., j} of indices. The traditional dynamic- 
programming algorithm relies on the fact that the only subproblems that arise 
are intervals {i,i + l,i + 2, ...,j}. A priori, with equality comparisons, one 
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could recursively generate a subproblem corresponding to an arbitrary subset 
of {1,2, We will demonstrate, however, that the number of subproblems 

needed to find the optimal BCST is O(n^). 

We show that there is always a tree having only less-than comparisons which 
has cost at most one more than the optimum when both types of comparisons 
are allowed, if the access frequencies sum to one. 

2 Preliminaries 

An input is specified by a sorted sequence (ai, 02 , • . • , fln) with corresponding 
nonnegative weights {wi,W 2 , • ■ • , Wn)- A solution is a binary comparison search 
tree (BCST) which is a binary tree with each node labeled with either an equality 
comparison of the form x = ail or a less-than comparison of the form x < ail. 
For both types of comparisons the two outcomes are “yes” and “no” and we will 
assume canonically that the left branch of the node representing the comparison 
corresponds to the “yes” answer. For any BCST T, there is a bijection between 
the leaves of the tree and the input items. For a tree T, we let /1(T) denote the 
leaf set of T. 

Definition 1 . The weight of a node v (denoted w{v) ) in a BCST T is defined 
as follows. If V is a leaf then w{v) is the weight of the input item labeling v. 
Otherwise, the weight of a node is the sum of the weights of its children. (In 
other words, the weight of v is the sum of the weights of all leaves which are 
descendants of v.) Similarly, define the weight of a tree to be the sum of the 
weights of its leaves. 



Definition 2 . The cost of a BCST T is defined to be c{T) = 
J2i£C{T) ii'(^)depth(^), where the depth of a node is the length of the path from 
the root to that node. Equivalently c(T) = X]^gy(r)-/:(T) 

An optimal BCST is one with minimal cost. We need several more definitions 
and preliminary lemmas en route to our optimal BCST algorithm. 

Definition 3 . The depth of a subtree is the depth of its root. 



Definition 4 . The side-weight of a node v (denoted sw{v) ) in a BCST is defined 
as follows. If V is a leaf, then sw{v) = 0. If v is a node representing an equality 
comparison (henceforth referred to as an equality node^, sw{v) = w(ai) where 
ai is the input item tested for equality at v. If v is a less-than node with children 
X and y, then sw{v) = min|'u;(a:), w(y)}. 



Lemma 1 . Let S be a sequence of items with associated weights and let 81,82 
be a “partition” of S into two complementary subsequences. LetT be o BCST for 
S. There there are BCST’s T\ for Si and T2 for S2 with c(Ti) -|- c(T 2 ) < c(T). 
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Proof. Let T be a BCST for S. Let Ti be obtained from T by removing all leaves 
which are in S2 and repeatedly “shorting out” any node with one child. Clearly 
Ti is a BCST for Si. Similarly obtain BCST T2 for S2. It is immediate from the 
(first) definition of the cost of a BCST that c(Ti) + c(T2) < c(T) and the lemma 
follows. ■ 

Note that if T does not have equality comparisons, then neither do T\ and T2. 

Lemma 2 . Let T be an optimal BCST. Ifu is the parent ofv in T, then sw{u) > 
sw{v) . 

Proof. Since the side- weight of a leaf is 0, we may assume that neither u nor v 
is a leaf. 

Case 1: Both u, v are less-than comparisons. Assume without loss of gen- 
erality that u is a right child of u. Let Ti be the subtree rooted at the child 
of u which is not v. Let T2,T^ be the subtrees rooted, respectively, at the 
left and right children of v. Let denote the weight of Tj, 1 < i < 3. Now 
sw{u) = min{o;i,a2 + 0^3} and sw{v) = min{a2,<a3}- For a contradiction, as- 
sume sw{u) < sw{v), i.e., min{ai,of2 + CK3} < min{o;2, 013}, which implies that 
ai < a2,o:i < «3. Now rotate v upward along the edge to its parent. While T2 
stays at the same depth, Ti moves down and T3 moves up, each by one level. 
The increase in cost is a\ — < 0. This contradicts the optimality of T. 

Case 2: Node u is a less-than comparison x < ai? and v is an equality 
comparison x = 02?. Let Ti, of weight, say, ai, be the subtree rooted at the child 
of u which is not v. Let 02 = W2 and let «3 be the weight of the subtree T3 rooted 
at the child of v not corresponding to 02. We have sw{u) = min{ai,of2 + 0:3}, 
sw{v) = a2. For a contradiction, assume that sw{u) < sw{v), i.e., min{ai, 02 + 
03} < 02- Hence a\ <02- 

Again, rotate v up along the edge to u, i.e., replace u by the comparison 
X = 02?. Replace u’s right child by a: < oi?. From that node’s left and right 
children, respectively, hang Ti and T3. Tree Ti moves down one level, T3 stays 
at the same level, yet 02, of weight «2j moves up one level. The net increase in 
cost is ai — a2 < 0, contradicting the optimality of T. 

Case 3: Both o, v are equality comparisons, say, u with an item oi of weight 
«i, V with an item 02 of weight «2- We have sw{u) = ai,sw(v) = 02- For a 
contradiction, assume ai <02- Swap the comparisons in u and v. Node oi moves 
down, 02 moves up. The increase in cost is ai — a2 < 0, a contradiction. 

Case 4' Node u is an equality comparison x = oi? and u is a less-than 
comparison x < 02?. Let «i be the weight of oi, and let T2 and of weight 
«2 and «3, respectively, be the subtrees hanging off v. Once again assume for 
a contradiction that sw{u) < sw{v), i.e., < min{o;2, 013}. Rotate v upward 

and make the comparison x = oi? at the appropriate child of v. Then exactly 
one of T2 and moves up while Oi moves down. The increase in cost is again 
negative, a contradiction. ■ 
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Corollary 1. If n > 2 and the root of an optimal BCST is an equality node, 
then the item tested for equality must he a largest-weight item. Ifn<2, then all 
possible BCST’s have the same cost. 

We omit the proof. 

Corollary 2. If there is an item Om such that Wm > | then there is 

an optimal BCST having an equality comparison with at the root. 

We omit the proof. We will see later that this factor of 1/2 can be reduced 
to 4/9. 

3 Thresholds 

In this section we assume that the sum of the weights is 1 and we refer to the 
weights as probabilities. 

Intuitively, if the maximum probability is large, there should be an optimal 
BCST whose root comparison is an equality comparison, which, by Corollary 1, 
must be with an item of maximum probability. Analogously, one would expect 
that if the maximum probability is small, there should not exist an optimal 
BCST with a root equality comparison. We study in this section the relationship 
between the maximum probability and the existence of an optimal BCST with 
a root equality comparison. 

If the maximum is very small, (we will see that) there cannot be an optimal 
BCST with an equality comparison at the root. Let us define A to be the supre- 
mum of all p such that for any input whose maximum probability is at most 
p, there is no optimal BCST with an equality comparison at the root. We will 
prove that if the maximum probability is less than 1 /4, then there is no optimal 
BCST with an equality comparison at the root (hence A > 1/4), and there is 
an instance with maximum probability 1/4 which has an optimal BCST with a 
root equality comparison (hence A < 1/4). So A = 1/4. 

How large should the maximum probability be, in order to guarantee the 
existence of an optimal BCST with a root equality comparison? By Corollary 2, 
if the maximum probability exceeds 1/2 then there is a BCST in which the root 
is an equality comparison. Must there be an optimal BCST with an equality 
comparison in the root if, instead, the maximum probability is, say, 0.4? Let us 
define fj, to be the infimum of all p such that for any input whose maximum 
probability is at least p, there is an optimal BCST with an equality comparison 
at the root. We will prove that if the maximum probability is at least 4/9, then 
there is an optimal BCST having an equality comparison at the root (hence 
pL < 4/9), whereas there are instances with maximum probability approaching 
3/7 from below which have no optimal BCST with an equality comparison at 
the root (hence p, > 3/7). So 3/7 < p< 4/9. 

Later, we will use the fact that A > 1/4 to design a polynomial-time algorithm 
to find the optimal BCST. 

We will use left or right subtree of T to mean the subtree of T rooted at the 
left or right child of the root of T. 




88 



Richard Anderson et al. 



Theorem 1. A = 1/4. 

Proof. First we prove that A > 1/4. Suppose, for a contradiction, that T is an 
optimal BCST where input item bo (= at for some i) with weight less than 1/4 
is tested for equality at the root. 

Then the side-weight of the root is equal to w{bo), which is less than 1/4. By 
Lemma 2 the side- weight of every other node in the tree is less than 1/4. 

For any internal equality comparison x = in a node v, let us call the 
branch leading to the child of v having leaf Oi the side branch, and the other 
branch the main branch. For a less-than comparison x < Oil at node v, we call 
the branch leading to the child of v of lesser weight the side branch, breaking 
a tie arbitrarily, naming the other branch the main branch. The weight of the 
child along the side branch from v is always sw{v). 

Let r = vo,vi,V 2 ,V 3 , ...,vi be the nodes, in order, along the unique path from 
the root to the leaf vi along main branches. Let s^, 0 < z < ? — 1, be the child of Vi 
on the side branch. Then sw{vi) = w{si). Furthermore, sw{vi) < sw(vo) < 1/4, 
by Lemma 2. Now w(T) = [z«(so) -I- zc(si) -I- w{s 2 ) -I- • • • -I- zc(si_i)] -I- w{vi). By 
Corollary 1, w{vi) < w{sq) = sw{vo) < 1/4. Hence w{T) < {I + l)sw{vo). If 
I < 3, then w{T) < 4 • sw{vo) < 1, contradicting the fact that w{T) = 1. Hence 
I > 4. 

Now let Ti,T 2 ,To be the trees hanging off the side branches from v\,V 2 ,vo, 
respectively, and let T 4 be the tree hanging off the main branch from U 3 . Note 
specifically that w{bo) > w{Ti) > w{T 2 ) > w{T 3 ). Figure 1 illustrates this 
configuration. In this and other figures in this section, we have departed from 
the usual convention and chosen to depict the side branches as left children and 
the main branches as right children, since there may be insufficient information 
to pin down which branch is the “yes” branch. 




Fig. 1. Tree T expanded along main branches to four levels 



A convention will be useful in the proof. Since the identities of the items 
are actually irrelevant — only the weights matter — we assume that Oi = i, i = 
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1,2, ..., n. Now each node Vi along the main branch from the root has either the 
comparison x = b{! or the comparison x < bil, where bt G {1, 2, ..., n}. In either 
case we will define bi G {1, 2, 3, n} to be the cut-point associated with the 
comparison at Vi. Pictorially, we will represent the cut-points on a number line 
with an “x” marking a cut-point corresponding to an equality node and a vertical 
line marking the cut-point corresponding to a less-than node. Thus our picture 
has a number line with four cut-points labeled bo,bi,b2, and 63, corresponding 
to vertices vq,vi,V 2 ,V 3 , respectively. 

Note the following fact: If Vi is a less-than node with comparison x < bi? 
having cut-point bi, then for all j > i, the cut-points bj must occur on the same 
side of bi, the side which is along the main branch at v,. In other words, if the 
main branch corresponds to a; > bi, the “yes” branch, then bj > bi for all j > i, 
and if the main branch corresponds to x < bi, then bj < bi for all j > i. The 
tree Ti contains only items from the other side of bi. 

Since and T4 are symmetrically located in T, we will assume without loss 
of generality that T3 contains items to the left of 63 and T4 contains 63 and items 
to the right. 

The idea of the proof is to show that we can rebalance the “skinny” tree 
shown in Figure 1 (making the root node a less-than), and reduce its cost, 
thereby contradicting the optimality of this tree. Since there are four cut-points 
corresponding to the tree in Figure 1 it is natural that the balanced tree T' 
should have at its root a less-than node which splits these cut-points equally. 
We will call the less-than comparison at the root of T' the dividing cut. 

There are two main cases. 

1. If the middle two cut-points bi and bj of T both correspond to equality nodes, 
then we will let the dividing cut be a; < bj? where bj > bi. 

2. Otherwise, (at least) one of the middle cut-points corresponds to a less- 
than comparison x < bi?. In this case, (choose one and) let this less-than 
comparison be the dividing cut. 

Case 1: Note that in this case there are two cut-points to be dealt with on 
either side of the dividing cut. Let m,p, 0<m<p<3, be such that the set 
of two cut-points to the left of bj is {bm,bp}. Then we perform the comparison 
occurring in node Vm of T at the left child of the root of T' and perform the 
comparison occurring in node Vp of T at the appropriate child of this left child. 
At the other subtree of this left child, put the optimal BCST for the appropriate 
set of items. We similarly handle the right side of the dividing cut: Where the 
rightmost two cut-points are bq,bj. with q < r, do the comparison with bq and 
then the comparison with 6^- Intuitively, since m < p, w(Tm) > w(Tp) and we 
want to have occur higher in T' . This is the reason for the above ordering of 
comparisons. 

Note that in this case, the dividing cut introduces a new split and thereby 
fractures one of the subtrees in T. Note also that the subtree fractured must be 
or T4. To see this, note that in order for Ti to be fractured, Vi must be a 
less-than comparison (otherwise Ti consists of a single node) and hence bi must 
be one of the two end cut-points. If bi is the leftmost cut-point and z < 3, then 
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Fig. 2. Example scenario where the middle two cnt-points are equalities, and the re- 
snlting tree T' 



since 63 occurs to the right of bt, Ti must be to the left and is not fractured. A 
similar argument holds if bt is the rightmost cut-point. Whichever of T3 or T4 is 
fractured, we will let Si and S2 denote the subtrees for the two pieces obtained 
using Lemma 1. Figure 2 shows one example scenario that falls within this case, 
and the resulting T'. 

We will compare the costs of T and T' by looking at the changes in the depths 
of (the leaf corresponding to) &o and the roots of Ti , T2 , and T3 . The depth of 
goes from 1 to 2 since the comparison x = bo? will be done at a child of the root 
of T' . There are two cases for the change in the depth of Ti. If bi occurs on the 
same side of the dividing cut as bo, then the depth of the root of Ti goes from 2 
to 3. Otherwise, it remains 2. In the former case, the depth of T2 goes from 3 to 
2, while in the latter case, the depth of T2 remains 3. Since w{Ti) > w{T2), the 
worst possible improvement is when the depth of Ti goes from 2 to 3. Note that 
the depth of all three pieces that constitute the original T3 and T4 goes from 4 
to 3. 

Using Lemma 1 for the fractured subtree, c(T') — c(T) < w{bo) + [w{Ti) — 
w{T 2)] - wiTs) - w(T 4) = w{bo) + w(Ti) - (1 - w{bo) - w{Ti)) = 2{w{bo) + 
w{Ti)) — 1. Since w{Ti) < w{bo) < 1/4, the bound above is negative, showing 
that the cost of T' is less than the cost of T and contradicting the assumption 
that T is optimal. 

Case 2: Since one of the four cut-points of T has already been dealt with as 
the dividing cut, we will have two cut-points to take care of on one side of the 
dividing cut and only one to take care of on the other. Just as in Case 1, the two 
cut-points on one side of the dividing cut are sequenced in the same order in T' 
as in T. The subtrees Ti, i = 1 , . . . , 4 , will not be fractured in this case since the 
cuts we will use in T' are the same as the cuts used in T. 

Consider first the situation where the dividing cut corresponds to cut-point 
bi- Since 62 and 63 are on the same side of 61, bo must be the lone cut-point 
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on one side and 62 and 63 must be on the other side. In the resulting tree 
T', the depth of bo goes from 1 to 2, the depth of Ti stays at 2, the depth 
of T 2 goes from 3 to 2 and the depths of T 3 and T 4 go from 4 to 3. Thus 
c(T') — c(T) < w{bo) — w{T 2 ) — w{To) — w{T 4 ) < 0, a contradiction. 

Next consider the case where the dividing cut corresponds to cut-point & 2 - 
An example of this situation when 63 is the lone cut-point on one side of the 
dividing cut is shown in Figure 3. In this case, the depths of bo and Ti increase 
by 1 , the depth of T 2 is unchanged and the depths of T 3 and T 4 decrease by 2 . 
Since w{To) + w{T 4 ) is at least 1/4 and w{bo) + w{Ti) is less than 1/2, the net 
change in cost is negative, again contradicting the optimality of T. If the lone 
cut-point on one side of the dividing cut is either bo or bi, then in the resulting 
T' the depth of bo increases by 1, the depth of Ti is unchanged and the depths 
of T 2 , T 3 and T 4 decrease by 1 , again yielding a net reduction in cost. 



b 




2 f>; 













(a) 




Fig. 3. Example scenario where &2 is the dividing cut, and the resulting tree T' 

Finally consider the situation where the dividing cut corresponds to cut-point 
& 3 . If &2 is the sole cut on one side of the dividing cut, then the depths of bo and 
Ti increase by 1 , the depth of T 2 drops by 1 and the depths of T 3 and T 4 drop 
by at least 1, giving a net reduction in cost. If bo or bi is the sole cut-point on 
one side of the dividing cut, then the depth of bo increases by 1 , the depths of 
Ti and T 2 are unchanged and the depths of T 3 and T 4 drop by at least 1 , again 
giving us the result. Figure 4 depicts one case of this situation. 

To prove that A < 1/4, we consider a six-item example ( 01 , 02 , . . . , 05 ) with 
weights (1/4, 0, 1/4, 1/4, 0, 1/4). By a case analysis, an optimal BCST has cost 
5/2 and at least one optimal tree has root equality comparison x = 03?. ■ 



Theorem 2. /i < 4/9. 

We omit the (long) proof. 



Theorem 3. /i > 3/7. 
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Fig. 4. Example scenario where 63 is the dividing cut, and the resulting tree T' 



Proof. Consider the weights (3/7 — 4e, 1/7+e, 0, 1/7+e, 1/7+e, 0, 1/7+e) for a 7- 
item example (oi, 02 , . • . , ay). By a case analysis, it can be shown that the unique 
optimal BCST, with cost 17/7 + 3e for all sufficiently small positive e, has root 
comparison x < a^l. By contrast the lowest weight tree with root comparison 
X = oi? has weight 17/7 + lOe. ■ 



4 The 0(n^)-Time Algorithm 

In this section we give an 0(n^)-time, dynamic programming-based algorithm 
for finding an optimal BCST on n items. The algorithm relies heavily on the 
fact that the initial comparison cannot be a: = a^? unless Oi has the maximum 
weight and that weight is at least 1/4 of the sum of all weights. Since the iden- 
tities of the items ai,a 2 ,...,an are irrelevant, the input consists of a sequence 
< wi,W 2 , ■■■,Wn > of nonnegative weights, assumed integral in this section. 

Our algorithm will compute the optimal cost for each of at most 16n^ sub- 
problems, computing each one in 0{n) time. The notation S =< ii, Z 2 , * 3 , ..., ir > 
with 1 < zi < Z2 < • • • < Zr < n or S' = {zi, Z2, za, ..., Zr} denotes the subprob- 
lem of finding the optimal BCST for the items numbered zi, Z 2 , za, ..., Zr, with 
associated respective weights Wi.i,Wi 2 ,Wi^, 

We will compute the optimal cost of each “valid” subproblem: 

Definition 5. Let S he a nonempty subset of {1,2, Let i = minS, j = 

maxS (possibly i = j), and M = maxjgszc;. S is valid if and only if it satisfies 
these two conditions: (1) S is closed (strictly) downward in the interval [i,j] in 
that S contains every I G \i, j] such that wi < M , and (2) For T = |/|z < I < j '■ 
wi = M}, either \T\<4 orTC S. 

First, a simple lemma. 

Lemma 3. The number of valid sets is at most 16rz^. 

We omit the proof. 

We will compute the optimal cost of valid sets in order of increasing size 
of valid sets, starting with valid sets of size one. To do so, we need a simple 
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name for each valid set. Give valid set S a unique name {i,j, k, v) as follows. Let 
i = minS”, j = maxS”. Let M = maxi^swi and let k = min{?|l G S,wi = M} 
(the leftmost position of an M in S'). Let i < ji < J 2 < ■ ■ ■ < jd < j be all 
the positions in which M appears as a weight of an item in [i,j] in the original 
list. Then z; = * if d > 5, in which case S contains all the positions in [i,j\ 
corresponding to weight M. Otherwise, v G {0, 1}'^ with vt = 1 iff jt G S. It is 
easy in 0(n) time to convert from the name of a valid set to an enumeration of 
its items or vice versa. 

The first step of the algorithm is to enumerate all valid sets, via their names, 
to calculate the size of each. We prepare, for each w G {1,2,..., n}, a list of the 
names of all valid sets of size w. We prepare an array cost{i,j, k,v), to contain, 
at termination, the optimal cost of the corresponding valid set. This takes 0{n‘^) 
time. The solution for the valid set (1, 2, 3, ..., n} is the optimal cost. 

We initialize cost{i,j, k,v) to 0 for all valid sets of size 1, and otherwise we 
initialize cost{i, j, k, v) to +oo. Then in increasing order by w, we calculate the 
optimal cost of each valid set S of size w as follows. In 0{n) time, we enumerate 
the items < zi,Z2, ■■■,iw > of S' in increasing order and the weight W of S. 

We consider first the possibility that the root comparison in some optimal 
BCST for S is a less-than comparison. To do so, we must consider the subprob- 
lems < ii > and < Z2, is, 14, ..., > (associated with x < ai^7), < i\,i 2 > and 

< is, *4, *5, ill) > (associated with x < 0^3?), ..., and < ii, i2, is, •.•, im-i > 
and < iw > (associated with x < The key point is that each of these 

subproblems is valid — see Lemma 4 — and smaller than S, so we already know 
the optimal cost of each. Furthermore, in 0{n) time in total, one can construct 
the names of all of them (Lemma 6). For each t = 1,2,3, ...,w — 1, let Ci de- 
note the cost of the left subproblem |ii,i2, ...,it} and let C 2 be the cost of the 
right subproblem (it+i, it+2, ..., in,}. We now replace cost{i, j, k,v) by the cost 
kF -I- Cl -I- C2 of the optimal tree rooted at x < if it is smaller than 

cost{i, j, k, v). 

Now we deal with the possibility that the root comparison in problem 
{i,j, k, v) is an equality comparison. We use the index k to find the largest weight 
M in S. In 0(n) time, we find all occurrences of M in S' and simultaneously sum 
the weights corresponding to positions in S. If M is less than a quarter of the 
sum, then we know that the root comparison cannot be an equality comparison, 
so we move on to the next valid set. Otherwise, M occurs at most four times 
in S. For each of the at most four indices it with Wi^ = M in S, generate the 
subproblem S — {it}, which is valid (Lemma 5), generate its name, look up its 
optimal cost C, and replace cost{i,j,k,v) by kF -I- G if it is smaller. All of this 
can be done in 0(n) time for each t. 

It is clear that the algorithm runs in O(n^) time. Furthermore, assuming that 
cost is correct for smaller sets, the construction of the algorithm ensures that 
the final value of cost{i, j, k, v) is an upper bound on the true optimal cost. The 
fact that the optimal tree must begin either with (1), a less-than comparison, or 
(2), an equality comparison on an item which is simultaneously the max and of 
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at least one fourth the total weight, ensures that cost(i,j, k, v) is a lower bound. 
Hence the algorithm is correct. 

Lemma 4. Suppose S =< ii,i 2 , > is valid. Then S' =< ii,i 2 ,-..,it > 
and S" =< it+i,it+ 2 j iw > are valid for all t, 1 < t < w — 1. 



We omit the proof. 

Lemma 5. Suppose S =< ii,...,iw > is valid, [S'! >2,M = max{r/;i^. |1 < j < 
w}, |{^|*i < I < iw '■ wi = M}\ < 4, and 1 < s < w is such that Wi^ = M. Then 
S' = S — {is} is valid. 

We omit the proof. 

Lemma 6. There is an 0{n)-time algorithm that takes a valid set 
< ii,i 2 , ....,iw > with 1 < zi < Z 2 < ••• < iw < n as input and calculates 
the names of the subproblems < zi, Z 2 , it > and < it+i,it+ 2 , ■■■, iw > for all t. 

We omit the proof, which uses running prefix and suffix computations. 



5 Comparison with Other Models 

5.1 BCST’s With Only Less-Than Comparisons 

The 3-item example with weights (0, 1,0) has optimal cost 1 when equality and 
less-than comparisons are allowed but optimal cost 2 when only less-than com- 
parisons are allowed. The following theorem demonstrates that this is the worst 
possible case. 

Theorem 4. If T is a BCST in which the weights sum to 1, then there is a 
BCST T' that uses only less-than comparisons such that c(T') < c(T) 1. 

We omit the proof. 
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Abstract. This paper addresses the state explosion problem in au- 
tomata based LTL model checking. To deal with large space require- 
ments we turn to use a distributed approach. All the known methods 
for automata based model checking are based on depth first traversal of 
the state space which is difficult to parallelise as the ordering in which 
vertices are visited plays an important role. We come up with entirely 
different approach which is dependent on locating cycles with negative 
length in a directed graph with real number length of edges. Our method 
allows reasonable distribution and the experimental results confirm its 
usefulness for distributed model checking. 



1 Introduction 

Model checking is a very successful technique for verifying concurrent systems 
and many verification tools were proposed in the last two decades. These tools 
verify a desired behavioural property of a reactive system over a given model 
through exhaustive enumeration of all the states reachable by the system and 
the behaviours that traverse through them. As a matter of fact, the main lim- 
iting factor in applications of such tools to practical verification problems is 
the real computational power available (time and especially memory). There- 
fore verification of complex concurrent systems requires techniques to avoid the 
state-explosion problem [9] . Several sequential methods (partial order reductions, 
on-the-fly search) to overcome this barrier have been proposed and successfully 
implemented in automatic verification tools. Recently, some attempts to use 
multiprocessors and networks of workstations have been undertaken. 

In [23] the authors describe a parallel version of the verifier Mur(p. The table 
of all reached states is partitioned over the nodes of the parallel machine and 
the explicit state enumeration is performed in parallel. A similar approach to 
distributed reachability analysis has been taken in [18]. A distributed version 
of the UPPAAL model checker based on the same idea as parallel Muri^ has 
been reported in [3]. Yet another distributed reachability algorithm has been 
proposed in [1], but has not been implemented. We stress that all mentioned 
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algorithms solve only the reachability problem and do not admit the complete 
linear time model checking. A distributed version of the LTL model checker 
SPIN [16] based on nested depth first search approach has been explored in [2]. 
Other recent papers attempt to use distributed environment of workstations for 
parallel symbolic model checking. [15] presents a parallel reachability analysis 
algorithm based on BDDs while in [4] distributed symbolic method has been 
applied to check safety RCTL properties. Papers [14,5] significantly extend the 
scope of properties that can be verified by presenting distributed symbolic model 
checking for ^-calculus and alternation free /r-calculus. 

In automata based LTL model checking the verification problem is repre- 
sented as the emptiness problem of a Biichi automaton which turns out to be 
equivalent to finding a cycle reachable from an initial state and containing an 
accepting state in the graph corresponding to the Biichi automaton. The best 
known algorithm for finding cycles in directed graphs is the Tarjan’s depth first 
search algorithm (DPS) [24]. The practical limitation of this algorithm is the 
amount of the randomly accessed memory which the algorithm requires. A space 
efficient alternative to Tarjan’s algorithm (so called nested DFS) allowing to 
optimise the amount of randomly accessed memory exists (see i.e. [17]) and is 
implemented in SPIN verification tool [16]. However, even this optimisation does 
not solve the state space explosion problem sufficiently. 

A very natural way how to overcome the memory limitation is to distribute 
the given graph onto several processors (computers) and to perform a distributed 
computation. As depth first search is P-complete, promising parallel DFS-based 
algorithms are unlikely to exist [21]. A completely different approach to dis- 
tributed emptiness problem is needed. This paper demonstrates the methodol- 
ogy of reducing the automata based LTL model checking problem to the negative 
cycle detection problem. The problem is to find a negative length cycle in a di- 
rected graph whose edges have real number lengths. 

The problem of negative cycles is closely related to the single-source short- 
est path (SSSP) problem. For this problem effective PRAM algorithms working 
with adjacency matrix representation of graphs are known, see i.e. [22]. However, 
the adjacency matrix representation is not compatible with other space-saving 
techniques like on-the-fly search. Other algorithms (for excellent survey see [8]), 
which are based on relaxation of graph’s edges, are inherently sequential and 
their parallel versions are known only for special settings of the problem. For 
general digraphs with non-negative edge lengths parallel algorithms are pre- 
sented in [19,20,12]. For special cases of graphs, like planar digraphs [25,13], 
graphs with separator decomposition [10] or graphs with small tree-width [7] 
more efficient algorithms are known. Yet none of these algorithms is applicable 
on directed graphs with potential negative cycles. 

We present a scalable distributed algorithm for the negative cycle problem 
and thus for automata based model checking of LTL formulas. Our method 
parallelises the model checking problem on a network of processors with disjoint 
memory that communicate via message passing. 
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The paper is organised as follows. We first review automata based LTL model 
checking and define the corresponding graph theoretic problem (Section 2). Its 
reduction to the negative cycle problem is outlined in Section 3. A distributed 
algorithm for the negative cycle problem is given in Section 4. Section 5 sum- 
marises the experimental results achieved. 



2 Automata Based LTL Model Checking 

Automata based approach to model checking of linear temporal logic formulas 
is a very elegant method developed by Vardi and Wolper [26]. The essence of 
using automata for model checking is that both the modelled system and the 
specification the system is supposed to fulfil are represented in the same way — 
as Biichi automata. 

Definition 1. A Biichi automaton is a tuple A = (S, S, s, p, F), where 

— E is a finite alphabet 

— S is a finite set of states 
~ s € S is the initial state 

— p : S X E ^ 2^ is a transition relation 

— F C S is a set of accepting states 

A run of A over an infinite word w = ai 02 ... is a sequence so;Si) • ■ • such that 
for all i > 1 Si & p{si-i,ai). A run so,si,... over w is accepting iff sq = s 
and {t \ t = Si infinitely often} fl F yf 0. A word w is accepted by A if there is 
an accepting run over w. The set of words accepted by A is denoted by L{A). 

States of the modelled finite-state system M are identified with the states 
of a Biichi automaton Am where all the states are accepting. Then, the set of 
behaviours of the system is the language L{Am)- On the other hand, for each 
LTL formula tp one can construct a Biichi automaton that accepts exactly the 
set of runs satisfying the formula p. Hence for the system M and LTL formula 
p the verification problem is to verify whether L{Am) Q L(A^p) or equivalently 
whether L{Am) O L(A-,^) is empty. Moreover one can build an automaton A for 
L{Am) n L{A^ip) having | M \ states. We need to check this automaton 

for emptiness [26]. 

Let A = {E, S, s, p, F) be a given automaton. Consider the directed graph 
Ga = (S,Ea) such that Ea = {(«, v) | v G p{u,a),a G E}. The following 
assertion can be easily verified [26]. 

Theorem 1. Let A be a Biichi automaton. Then L{A) is non-empty iff Ga has 
a cycle that is reachable from the initial state s and contains some accepting 
state. 

Detection of a reachable accepting cycle in a graph corresponding to a Biichi 
automaton is thus at the heart of most automata based model checkers. The 
depth first search strategy (DFS) provides a suitable time efficient approach. 
However, in large applications graphs are often too massive to fit completely 
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inside the computer’s internal memory. The resulting input /output paging be- 
tween fast internal memory and slower external memory (such as disks) is then 
a major performance bottleneck. 

In order to overcome problems with the limited size of randomly accessed 
memory we suggest to divide the graph onto several processors. The simplest 
solution is to run some DFS based algorithm on those processors. Instead of pag- 
ing, computation is handed over to a processor owning related data i.e. paging 
is substituted by communication. As communication among processors is rather 
time consuming this approach could end up with algorithms which are compar- 
atively slow (this finding is supported by experiments presented in Section 5) . 

Our methodology is based on the reduction of the Biichi automaton emptiness 
problem to a problem of detecting a negative cycle in an directed graph as is 
illustrated in the following section. 

3 Negative Cycles 

The negative cycle problem is a well-studied problem in connection with the 
single-source shortest path (SSSP) problem. We are given a triple (G, s, 1), where 
G = (V,E) is a, directed graph with n vertices and m edges, I : E ^ R is a 
length function mapping edges to real-valued lengths, and s G P is the source 
vertex. The length of path p =< Vq,Vi, . . . ,Vk > is the sum of the lengths of 
its constituent edges, l{p) = We define the shortest path length 

from s to V by S{s,v) = min{?(p) | p is a path from s to w} if there is such a 
path and S{s,v) = oo otherwise. A shortest path from vertex s to vertex v is 
then defined as any path p with length l(p) = <5(s, v). If the graph G contains no 
cycle c with negative length 1(c) (negative cycle) that is reachable from source 
vertex s, then for all u G P the shortest path length remains well-defined and 
the graph is called feasible. If there is a negative cycle reachable from s, shortest 
paths are not well-defined as no path from s to a vertex on the cycle can be a 
shortest path. If there is a negative cycle on some path from s to v, we define 
(5(s, v) = —oo. 

The SSSP problem is to decide whether, for a given triple (G, s, 1), the graph 
G is feasible and if it is then to compute shortest paths from the source vertex 
s to all vertices v G V. The negative cycle problem is to decide whether G is 
feasible. 

The connection between the negative cycle problem and the Biichi automaton 
emptiness problem is the following. A Biichi automaton corresponds to a directed 
graph Ga as defined in Section 2. Let us assign lengths to its edges in such a 
way that all edges out-coming from vertices corresponding to accepting states 
have length -1 and all others have length 0. With this length assignment, negative 
cycles simply coincide with accepting cycles and the problem of Biichi automaton 
emptiness reduces to the negative cycle problem. 

Theorem 2. Let A be a Biichi automaton. Let G^ = (Ga, s, 1) where I : Ea — >■ 
{0,-1} is the length function such that l(u,v) = — 1 iff u G E. Then L(A) is 
non-empty iff G^ has a negative cycle reachable from s. 
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4 Distributed Negative Cycle Detection Algorithm 

The general sequential method for solving the SSSP problem is the scanning 
method [11,8]. For every vertex v, the method maintains its distance label d{v), 
parent vertex p{v) and status S{v) G {unreached, labelled, scanned}. The sub- 
graph Gp of G induced by edges (p{v),v) for all v such that p{v) yf nil, is 
called the parent graph. Initially for every vertex v, d{v) = oo, p{v) = nil and 
S{v) = unreached. The method starts by setting d{s) = 0, p{s) = nil and 
S'(s) = labelled. At every step, the method selects a labelled vertex v and ap- 
plies to it a scanning operation. During scanning a vertex v, every edge {v,u) 
outcoming from v is relaxed which means that if d{u) > d{v) + l{v,u) then d{u) 
is set to d{v) + l{v,u) and p{u) is set to v. The status of v is changed to scanned 
while the status of u is changed to labelled. If all vertices are either scanned or 
unreached then d gives the shortest path lengths and Gp is the graph of shortest 
paths. 

Different strategies for selecting a labelled vertex to be scanned next lead to 
different algorithms. Our strategy comes out from the Bellman-Ford-Moore [8] 
algorithm which uses FIFO strategy to select a labelled vertex. The next vertex 
to be scanned is removed from the head of the queue; a vertex that becomes 
labelled is added to the tail of the queue if it is not already on the queue. 

For graphs where negative cycles could exit the scanning method must be 
modified to recognise the unfeasibility of the graph. As in the case of scanning 
various strategies are used to detect negative cycles [8]. However, not all of them 
are suitable for our purposes - they are either uncompetitive (as for example 
time-out strategy) or they are not suitable for distribution (such as the admissi- 
ble graph search which uses hardly parallelizable DFS or the level-based strategy 
which employs global data structures). For our distributed algorithm we have 
used the walk to root strategy. 

The walk to root strategy is based on the fact that any cycle in Gp is a 
negative cycle. Suppose the relaxation operation applies to an edge (v,u) (i.e. 
d{u) > d{v) + l{v, u)) and the parent graph Gp is acyclic. This operation creates 
a cycle in Gp if and only if u is an ancestor of v in the current tree. This can 
be detected by following the parent pointers from v to s. If the vertex u lies 
on this path then there is a negative cycle; otherwise the relaxation operation 
does not create a cycle. However, the walk to root method increases the cost 
of applying the relaxation operation to an edge to 0{n) since the cost of the 
search is 0{n). Therefore the walk to root is performed only after the underlying 
relaxation algorithm performs I7(n) work. The running time of walk to root is 
thus amortised over the relaxation time and overall time complexity is increased 
only by a constant factor. To preserve the termination of the strategy we will 
change and explain its behaviour afterwards. 

The negative cycle detection algorithm NC we are proposing works in a 
distributed environment (no global information is directly accessible) where all 
processors communicate via message passing. We suppose that the set of ver- 
tices of the inspected graph is divided into disjoint subsets. The distribution is 
determined by the function owner which assigns every vertex v to a, processor a. 
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For every vertex v processor owneriy) knows its adjacency list. The distribu- 
tion can be realized on-the-fly. Each processor a is responsible for its own part 
G“ = (Va,Ea) of the graph G determined by the owned subset of vertices. Good 
partition of vertices among processors is important because it has direct impact 
on communication complexity and thus on run-time of the program. We do not 
discuss it here because it is itself quite a difficult problem and depends on the 
concrete application. 

The main idea of the distributed algorithm NC can be summarised as fol- 
lows. The distributed computation is initiated by the process Manager which 
performs the necessary initialisations. All processors participating in the algo- 
rithm execute the same program. Each processor performs repeatedly the basic 
scanning operation on all its vertices with labelled status (procedure MAIN). 
Such vertices are maintained in the processor’s local queue Q°‘ . To process a 
vertex v which belongs to a different processor a message is sent to the owner 
of V. In each iteration it first processes messages received from other processors. 
Several types of messages could arrive: 

— a request to update parameters of a vertex u. The procedure UPDATE 
compares the current value d{u) with the received one. If needed, parameters 
are updated and the vertex u is placed into the queue. 

— a request to continue in a walk, satisfied by executing the WTR procedure. 

— a request to continue in removing marks, satisfied by executing the REM 
procedure. 

Pseudo- Code of the Distributed Algorithm NC 

1 proc MAINQ {running on each processor a] 

2 stamp := 0; 

3 if a = Manager then Q“ = {s}; d(s) 0; p(s) := nil else Q“ := 0 fl 

4 while not finished dp process_messages; v := pop(Q“); SCAN{v) od 

5 end 

1 proc SCAN{v) 

2 foreach {v,u) € E do 

3 if owner{u) = a 

4 then UPDATE(u.v.d(v) + l(v.u)) 

5 else send_message{owner{u), “start UPDATE{u, v, d{v) -|- l{v, w))”) fl pd 

6 end 



1 proc UPDATE(u,v,t) 

2 if d(u) > t then if walk{u) A [nil, nil] 

3 then if owneriy) = a 

4 then push(Q°‘ ,v) 

5 else send-message{owner{v), “do push{Q,v)”) fi 

6 else d{u) := t; p{u) := v; 

7 if WTRmmortization then WTR{[u, stamp], u); 

8 stamp -I- + fl; 

9 if « ^ then pusb{Q°‘,u) fl fi fl 
10 end 
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1 proc WTR([origin, stamp], at) {Walk To Root} 

2 done := false, 

3 while -^done do 

4 if owner (at) = a 

5 then 

6 if walk (at) = [origin, stamp] — > 

7 send-message{M onager , “negative cycle found”); 

8 terminate 

□ (at = source) V (walk{at) > [origin, stamp]) — >• 
if origin £ Va 

then REM ([origin, stamp], origin) 
else send_message(owner (origin) , 

“start REM ([origin, stamp], origin))” fi 

done := true-, 

□ (walk(at) = [nil, nil]) V (walk(at) < [origin, stamp]) — > 
walk(at) := [origin, stamp]; 
at := p(at) 



9 

10 

11 

12 

13 

H 

15 

16 

17 

18 

19 

20 
21 
22 
23 



fi 



else 



send_message(owner(at), “start lTr7?([origin, stamp], at)”); 
done := true 



od 

end 



1 proc REM ([origin, stamp], at) {Remove Marks} 

2 done := false; 

3 while -^done do 

4 if owner (at) = a 

5 then if walk (at) = [origin, stamp] 

6 then walk(at) := [nil, nil]; 

7 at := p(at) 

8 else 

done -.= true fl 

else send_message(owner(at), start REM ([origin, stamp], at)); 
done -.= true fi 



od 



13 end 



The SCAN procedure scans a vertex v. Every edge (v, u) outcoming from v is 
relaxed which means that if d{u) > d(v) + l{v,u) then d(u) is set to d{v)+l{v,u) 
and p(u) is set to n. If the vertex u lies on a walk to root path its parameters 
are not changed and the vertex v is placed back into the queue. 

The WTR procedure is responsible for the negative cycle detection. The 
procedure follows the parent pointers starting from the state where the procedure 
has been invoked (origin). It is initiated after relaxation of an edge and according 
to a suitable amortisation strategy ( WTR-amortisation condition becomes true 
every n-th time it is called). In the distributed environment it may be the case 
that even if the vertex v does not lie on any cycle, the parent graph can contain 
a cycle created in the meantime by some other processor. It can happen that 
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WTR initiated from v reaches such a cycle and never finishes. The amortisation 
brings about this problem as well. To fix it each processor maintains a counter of 
started WTR procedures. WTR marks (variable walk) each vertex through which 
it proceeds by the name of the vertex where the walk has been initiated {origin) 
and the current value of the processor counter {stamp). A cycle is detected 
whenever a vertex with the actual origin and stamp is reached. 

Moreover, it can happen that more than one WTR procedure is active at a 
time. In such a situation the concurrent walks could overwrite its own marks 
preventing thus detection of a cycle. It is sufficient to complete only one of them 
- if there is a cycle it will be detected. To decide which walk should continue let 
us suppose that a total linear ordering on vertices is given. A walk with lower 
origin is stopped. 

There are four possible situations that can happen during the walk: 

— the procedure reaches the source vertex s (line 9). A negative cycle has not 
been detected and the REM procedure is started. 

— the procedure reaches a vertex marked with the same origin and the same 
stamp (line 6). This indicates that a negative cycle has been recognised. The 
cycle can be easily reconstructed by following parent edges. If necessary, 
the path connecting the cycle with the source vertex can be found using a 
suitable reachability algorithm. 

— the procedure reaches a non-marked vertex, a vertex already marked with 
lower origin or a vertex marked with the same origin but lower stamp 
(line 15). The vertex is marked with [origin, stamp] and the walk follows 
the parent edge. 

— the procedure reaches a vertex already marked with higher origin (line 9). 
The walk is stopped and the REM procedure is started. 

Whenever WTR has to continue in a non-local vertex a request to the vertex 
owner is sent and the local walk is finished. 

The purpose of the REM procedure is to remove marks introduced by the 
WTR procedure. These marks could otherwise obstruct some possible future 
runs of WTR through marked vertices. Marks to be removed are found with the 
help of parent edges (this is why the updating of a marked vertex is postponed 
(line 2 of UPDATE)). The REM procedure follows the path in the parent graph 
starting from the origin in a similar way as WTR does. It finishes when it 
reaches a source vertex or a vertex marked with different origin. However, this 
does not guarantee that all marks are removed at that very moment. Note that 
these marks will be removed by some other REM procedure eventually. The 
correctness of cycle detection is guaranteed as for the cycle detection the equality 
of both origin and stamp is required. 

The distributed algorithm terminates when either all queues of all processors 
are empty and there are no pending messages or when a negative cycle has been 
detected. The Manager process is used to detect termination and to finish the 
algorithm by sending a termination signal to all the processors. 

Theorem 3 (Correctness and Complexity). 

If G has no negative cycle reachable from the source s, then the algorithm termi- 
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nates, d{v) = 6{s, v) for all vertices v & V , and the parent graph Gp is a shortest 
path tree rooted at s. Otherwise the existence of a negative cycle is reported. 

If G is distributed over P processors each of which owns 0{n/P) vertices, then 
the worst case computation complexity is 0(ji^/P). 

For detailed proof of the correctness and the complexity analysis see [6]. 

5 Experiments 

We have implemented the algorithm proposed in Section 4. The implementation 
has been done in CH — h and the experiments have been performed on a cluster 
of eight 366 MHz Pentium PC Linux workstations with 128 Mbytes of RAM 
each interconnected with a fast 100Mbps Ethernet and using Message Passing 
Interface (MPI) library. 

In the implementation of the NC algorithm we have employed the following 
optimisation scheme. For more efficient communication between processors we 
do not send separate messages. The messages are sent in packets of pre-specified 
size. The optimal size of a packet depends on the network connection and the un- 
derlying communication structure. In our case we have achieved the best results 
for packets of size about 100 single messages. 

As far as we know there is no other distributed algorithm for negative cycle 
problem (see Section 1). Therefore our objective was to compare the performance 
of the NC algorithm with algorithms used in LTL model checkers. For compari- 
son we have used very effective nested depth first search (NDFS) algorithm [17] 
used in SPIN verification tool [16]. In its distributed version the graph is divided 
over processors like in the NC algorithm. Only one processor, namely the one 
owning the actual vertex in the NDFS search, is executing the nested search at 
a time. The network is in fact running the sequential algorithm with extended 
memory. The worst case space complexity of NDFS is asymptotically the same 
as the one of our algorithm NC. The worst case time complexity of NDFS is 
linear in the number of vertices and edges. 

We performed several sets of tests on different instances in order to verify how 
fast is the algorithm in practice, i.e. beyond its theoretical characterisation. Our 
experiments were performed on two kinds of systems given by random graphs and 
generated graphs. Graphs were generated using a simple specification language 
and an LTL formula. In both cases we tested graphs with and without cycles to 
model faulty and correct behaviour of systems. As our real example we tested 
the parametrised Dining Philosophers problem. Each instance is characterised by 
the number of vertices and the number of cross-edges. The number of cross-edges 
significantly influences the overall performance of distributed algorithms. 

For each experiment we report the average time in minutes and the number 
of sent messages (communication) as the main metrics. Table 1 summarises the 
achieved results. 

The experiments lead basically to the following conclusions: 

— NC algorithm is comparable with the NDFS one on all graphs. 

— NC algorithm is significantly better on graphs without negative cycles. 
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Table 1. Summary of experimental results 
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Experiments show that in spite of worse theoretical worst time complexity of 
NC algorithm its behaviour in practice can outperform the theoretically better 
NDFS one. This is due to the number of communications which has essential 
impact on the resulting time. In NC algorithm the messages can be grouped into 
packets and sent together. It is a general experience that the time needed for 
delivering t single messages is much higher than the time needed for delivering 
those messages grouped into one packet. On the other hand, NDFS algorithm 
does not admit such a grouping. Another disadvantage of NDFS is that during 
the passing of messages all the processors are idle, while in NC algorithm the 
computation can continue immediately after sending a message. Last but not 
least, in NDFS all but one processor are idle whereas in NC all can compute 
concurrently. We notice that all mentioned advantages of NC algorithm demon- 
strate themselves especially for systems without cycles where the whole graph 
has to be searched. This is in fact the desired property of our algorithm as the 
state explosion demonstrates itself just in these cases. Both algorithms perform 
equally well on graphs with cycles. 

We have accomplished yet another set of tests in order to validate the scala- 
bility of the NC algorithm. The tests confirm that it scales well, i.e. the overall 
time needed for treating a graph is decreasing as the number of involved proces- 
sors is increased. 
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6 Conclusions 

Parallel and distributed algorithms for reachability analysis and model checking 
have recently been investigated as a possible method to handle large state spaces. 
The core problem of automata based model checking is the detection of reachable 
accepting cycles in the state space. The classical depth first strategy provides 
a suitable approach to cycle detection in a sequential case. However, the depth 
first search approach is difficult to distribute. 

The paper proposes a novel approach to the cycle detection problem in a 
distributed environment. The main idea is to transform the accepting cycle de- 
tection problem to the single-source shortest path problem in graphs with real 
number edge lengths - negative cycle problem. We have proposed a scalable 
distributed algorithm to solve this problem and we have performed a series of 
experiments to evaluate its performance. 

The performance of the algorithm was compared with a distributed DFS 
based algorithm. The experimental results show that the distributed algorithm 
based on negative cycle detection significantly outperforms the DFS based one 
due to higher degree of asynchronous parallelism which allows to optimise nec- 
essary communication. DFS based algorithms rely on strict synchronisation. 

In the future we aim to embed the algorithm in a suitable automata based 
verification tool (e.g. SPIN) to be able to test its applicability to a non-trivial 
series of real systems. Furthermore, we intend to explore various heuristics and 
implementation techniques to optimise its performance. 



References 

1. S. Aggarwal, R. Alonso, and C. Courcoubetis. Distributed reachability analysis 
for protocol verification environments. In Discrete Event Systems: Models and 
Application, volume 103 of LNCS, pages 40-56. Springer, 1987. 

2. J. Barnat, L. Brim, and J. Stfibrna. Distributed LTL Model-Checking in SPIN. 
In Proc. SPIN 2001, volume 2057 of LNCS, pages 200-216. Springer, 2001. 

3. G. Behrmann, T. S. Hune, and F. W. Vaandrager. Distributed timed model check- 
ing — how the search order matters. In Proc. CAV 2000, volume 1855 of LNCS, 
pages 216-231. Springer, 2000. 

4. S. Ben-David, T. Heyman, O. Grumberg, and A. Schuster. Scalable distributed 
on-the-fiy symbolic model checking. In Proc. FMCAD 2000, 2000. 

5. B. Bollig, M. Leucker, and M Weber. Parallel model checking for the alternation 
free mu-calculus. In Proc. TACAS 2001, volume 2031 of LNCS, pages 543-558. 
Springer, 2001. 

6. L. Brim, I. Cerna, P. Krcal, and R. Pelanek. Distributed shortest 
path for directed graphs with negative edge lengths. Technical Re- 
port FIMU-RS-2001-01, Faculty of Informatics, Masaryk University Brno, 
http:/ /www. fi.muni.cz/informatics/reports/, 2001. 

7. S. Chaudhuri and C. D. Zaroliagis. Shortest path queries in digraphs of small 
treewidth. In Proc. ESA 1995, volume 979 of LNCS, pages 31-45. Springer, 1995. 

8. B. V. Cherkassky and A. V. Goldberg. Negative-cycle detection algorithms. Math- 
ematical Programming, (85):277-311, 1999. 




Distributed LTL Model Checking Based on Negative Cycle Detection 107 



9. E.M. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Progress on the state 
explosion problem in model checking. In Informatics -10 Years Back. 10 Years 
Ahead, volume 2000 of LNCS, pages 176-194. Springer, 2001. 

10. E. Cohen. Efficient parallel shortest-paths in digraphs with a separator decompo- 
sition. Journal of Algorithms, 21(2):331-357, 1996. 

11. T. H. Cormen, Ch. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. 
MIT, 1990. 

12. A. Crauser, K. Mehlhorn, U. Meyer, and P. Sanders. A parallelization of Dijkstra’s 
shortest path algorithm. In Proc. MFCS 1998, volume 1450 of LNCS, pages 722- 
731. Springer, 1998. 

13. P. Spirakis D. Kavvadias, G. Pantziou and C. Zaroliagis. Efficient sequential and 
parallel algorithms for the negative cycle problem. In Proc. ISAAC 1994, volume 
834 of LNCS, pages 270-278. Springer, 1994. 

14. O. Grumberg, T. Heyman, and A. Schuster. Distributed model checking for mu- 
calculus. In Proc. 13th Conference on Computer-Aided Verification CAVOl, LNCS. 
Springer, 2001. 

15. T. Heyman, D. Geist, O. Grumberg, and A. Schuster. Achieving scalability in 
parallel reachability analysis of very large circuits. In Proc. CAV 2000, volume 
1855 of LNCS, pages 20-35. Springer, 2000. 

16. G. J. Holzmann. The model checker SPIN. IEEE Transactions on Software Engi- 
neering, 23(5):279-295, 1997. 

17. G.J. Holzmann, D. Peled, and M. Yannakakis. On nested depth first search. In 
The Spin Verification System, pages 23-32. American Mathematical Society, 1996. 

18. F. Lerda and R. Sisto. Distributed-memory model checking with SPIN. In Proc. 
SPIN 1999, number 1680 in LNCS. Springer, 1999. 

19. U. Meyer and P. Sanders. Parallel shortest path for arbitrary graphs. In Proc. 
EUROPAR 2000. LNCS, 2000. 

20. K. Ramarao and S. Venkatesan. On finding and updating shortest paths distribu- 
tively. Journal of Algorithms, 13:235-257, 1992. 

21. J.H. Reif. Depth-first search is inherrently sequential. Information Processing 
Letters, 20(5):229-234, 1985. 

22. S.H. Roosta. Parallel processing and parallel algorithms. Springer, 2000. 

23. U. Stern and D.L. Dill. Parallelizing the Mur:p verifier. In Proe. CAV 1997, volume 
1254 of LNCS, pages 256-267. Springer, 1997. 

24. R. Tarjan. Depth first search and linear graph algorithms. SIAM Journal on 
eomputing, pages 146-160, 1972. 

25. J. Traff and C.D. Zaroliagis. A simple parallel algorithm for the single-source 
shortest path problem on planar digraphs. In Proe. IRREGULAR-3 1996, volume 
1117 of LNCS, pages 183-194S. Springer, 1996. 

26. M. Y. Vardi and P. Wolper. An automata-theoretic approach to automatic program 
verification. In Proc. LICS 1986, pages 332-344. Computer Society Press, 1986. 




Computability and Complexity Results for a 
Spatial Assertion Language for Data Structures 



Cristiano Calcagno^’^, Hongseok Yang^, and Peter W. O’Hearn^ 



^ Queen Mary, University of London 
^ DISI, University of Genova 
® ROPAS, KAIST 



Abstract. This paper studies a recently developed an approach to rea- 
soning about mutable data structures, which uses an assertion language 
with spatial conjunction and implication connectives. We investigate 
computability and complexity properties of a subset of the language, 
which allows statements about the shape of pointer structures (such as 
“there is a link from x to y”) to be made, but not statements about the 
data held in cells (such as “x is a prime number”). We show that valid- 
ity, even for this restricted language, is not r.e., but that the quantifier- 
free sublanguage is decidable. We then consider the complexity of model 
checking and validity for several fragments. 



1 Introduction 

This paper studies a recently developed an approach to reasoning about mutable 
data structures [9,5]. The assertion language includes spatial conjunction and 
implication connectives alongside those of classical logic, in the style of the logic 
of Bunched Implications [8] . The conjunction P*Q is true just when the current 
heap can be split into disjoint components, one of which makes P true and the 
other of which makes Q true. The implication P^ Q says that whenever P is 
true for a new or fresh piece of heap, Q is true for the combined new and old 
heap. In addition, there is an atomic formula, the points-to relation E F,G, 
which says that E points to a cons cell holding E in its car and G in its cdr. 

As a small example of *, 



{x H> a, y) * (y b,x) 

describes a two-element circular linked list, with a and 6 in the data fields. The 
conjunction * here requires x and y to be pointers to distinct and non-overlapping 
cells. For an example of , 

{x 1 -^ a,b) * {{x !->■ c, b)^ P) 

says that x points to a cell holding (a, b), and that P will hold if we update the 
car to c. 

The logic of [9,5,7] can be used to structure arguments in a way that leads 
to pleasantly simple proofs of pointer algorithms. But the assertion language 
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that the logic uses to describe pre and postconditions is itself new, and its prop- 
erties have not been studied in detail. The purpose of this paper is to study 
computability and complexity problems for the language. 

We consider a pared down sublanguage, which includes the points-to relation 
and equality as atomic predicates, but not arithmetic or other expressions or 
atomic predicates for describing data. We do this to separate out questions 
about the shapes of data structures themselves from properties of the data held 
in them. This also insulates us from decidability questions about the data. In 
our language we can write a formula that says that x points to a linked list with 
two nodes, but not a formula that says that the list is sorted. 

Our first result is that, even with these restrictions, the question of validity 
is not r.e. The spatial connectives are not needed for this negative result. This 
result might seem somewhat surprising, given the sparseness of the language; 
decidability would obtain immediately were we to omit the points-to relation. 
The proof goes by reduction from a well-known non-r.e. problem of finite model 
theory: deciding whether a closed first-order logic formula holds for all nonempty 
finite structures. 

This result has two consequences. The first is that it tells us that we cannot 
hope to find an axiomatic description of i— >■, adequate to the whole language. The 
second is that we should look to sublanguages if we are to find a decidability 
result. 

Our second result is that the quantifier-free sublanguage is decidable. The 
main subtlety in the proof is the treatment of -* , whose semantics uses a uni- 
versal quantification over heaps. This is dealt with by a bounding result, which 
restricts the number of heaps that have to be considered to verify or falsify a 
formula. 

We then consider the complexity of model checking and validity. For the 
quantifier-free fragment and several sublanguages both questions are shown to be 
PSPACE-complete. One fragment is described where the former is NP-complete 
and the latter 7T|’-complete. We also remark on cases where (like in propositional 
calculus) model checking is linear and validity coNP-complete. 



2 The Model and the Assertion Langnage 

In this section we present a spatial assertion language and its semantics. The 
other sections study properties of fragments of this language. 

Throughout the paper we will use the following notation. A finite map / 
from AT to y is written /: X ^fin Y, and dom{f) indicates the domain of /. 
The notation fffg means that / and g have disjoint domains, and in that case 
/ * 5 is defined by (/ * g){x) = y iS f{x) = y or g{x) = y. 

The syntax of expressions E and assertions P for binary heap cells is given 
by the following grammar: 

E ::= x,y . . . \ nil 

P ::= {E^ E,E) \ E = E \ false | P P | Vcc. P | emp | P * P | P^= P 
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Expressions are either variables or the constant nil. Assertions include equality, 
usual connectives from first-order classical logic, and spatial connectives. The 
predicate {E >->• Ei.E^) asserts that E is the only allocated cell and it points 
to a binary heap cell containing Ei in the i-th component. The assertion emp 
says that the heap is empty. The assertion Pi * P2 means that it is possible to 
split the current heap in disjoint sub-heaps making the two assertions true. The 
assertion P\^ P2 means that for each new heap disjoint from the current one 
and making P\ true, the combined new and old heap makes P2 true. 

The other logical connectives are expressible as usual as derived notation: 

-nP = P^ false A P2 = -(A =» -T2) 3 x. P = -(Vx. -P) 

Expressions and assertions for binary heap cells are interpreted in the following 
model: 

Val = Loc U {nil} 

Stack = Var — >■ Val 
Heap = Loc ^fin Val x Val 
State = Stack x Heap 

Values are either locations or nil, and a state is composed of a stack and a 
heap. The heap is a finite map from locations to binary heap cells, whose do- 
main indicates the locations that are allocated at the moment. The semantics of 
expressions and assertions is given in Table 1 . 

Definition 1 (Validity). We say that P is valid, written \= P, if s,h\= P for 
all the states (s, h) . 



Table 1 . Semantics of Expressions and Assertions 



Ia;]s = s{x) 
|nil|s = nil 



s, h 
s, h 
s, h 
s, h 
s, h 
s, h 

s, h 

s, h 



(E Ei,E2) iff dom(h) = {{Ejs} and = (IPi]s, [Pajs) 

El = E2 iff lEijs = [P2IS 

false never 

Pi ^ P2 iff if s, h 1 = Pi then s,h \= P2 

emp iff dom{h) = 0 

Pi * P2 iff there exist hi and h2 such that 



hiifh2’, hi * h2 ~ h\ s, hi ^ Pi; s, /12 P2 
Pi~* P2 iff for all hi such that hfj^hi and {s,hi) |= Pi, 

{s,h* hi) 1 = P2 

\/x. P iff for any v in Val, s[a; 1-^ v], h \= P 
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3 Undecidability 

The main result in this section is that the validity problem is not recursively 
enumerable even when the spatial connectives, *, emp and , do not appear in 
assertions. 

Theorem 1. Deciding whether an assertion is valid is not recursively enumer- 
able even when the assertion language is restricted as follows: 

P ::= {E ^ E,E) \ E = E \ false \ P P \ \fx.P 

where {E ^ Ei, E2) is {E 1— >■ Ei, E2) * true. 

Note that the theorem uses an intuitionistic variant of the predicate 1— be- 
cause only with the 1— >■ predicate, we can not express that a heap cell I is allocated 
and contains (ui, V2) without requiring that I is the only allocated heap cell. The 
meaning of {E ^ Ei,E2) is that a heap cell E is allocated and contains {Ei, E2) 
but it need not be the only allocated cell. 

We prove the theorem by reducing the validity on nonempty finite structures 
of closed first-order logic formulas to validity for our restricted language. Then, 
the conclusion follows by a standard result from finite model theory [3] : 

Theorem 2 (Trakhtenbrot). Even if a signature consists only of one binary 
relation, the set of closed first-order logic formulas valid on all nonempty finite 
structures is not recursively enumerable. 

The reduction goes by translating a first-order logic formula with a single binary 
relation R to an assertion. Let be a first-order logic formula, which is not 
necessarily closed. The translation rd{(p) is given as follows: 

rd{ip) = (3x. (a; ^ nil, nil)) ^ prd((/?) 

prd{R{x, y)) = {3z. (z x, y)) A (x nil, nil) A {y ^ nil, nil) 

prd{(f =>')/')= prd{(p) prd{if) 

prd(false) = false 

prd{x = y) = {x = y) A {x ^ nil, nil) 

prd{3x. (p) = 3x. ((x ^ nil, nil) A prd{ip)) 

Intuitively, the translation encodes the relation R-^ and the universe |Al| of a 
nonempty finite structure A by heap cells: each element in |^| is encoded as an 
allocated cell containing {nil, nil), and a related pair (01,02) in R-^ is encoded 
as an allocated cell containing {x,y) where x and y are encodings of oi and 02, 
respectively. Note that the guard (3a;. (x ^ nil, nil)) in the definition of rd{(p) 
models the fact that the universe of a finite structure must be nonempty. 

The reduction becomes complete once we prove that for closed first-order 
logic formulas, the translation preserves and reflects validity. To show that va- 
lidity is reflected, we prove a lemma which implies that for all nonempty finite 
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structures A and environments 77 (mapping variables to elements of |-4|), it is 
always possible to find a state (s, h) so that 

A,ri\= (p (s, h) 1 = rd{ip) for all closed first-order formulas p. 



Lemma 1. Let A he a nonempty finite structure for the signature {-R}, where 
R is a binary relation. For all heaps h and sets B, C of locations such that 

— {R,C} is a partition of dom{h); 

— ^ is a bijection from |^| to B such that h{'j{a)) = (nil, nil) for all a € \A\; 
and 

— 6 is a bijection from to C such that /i(5(ai, 02)) = (7(01), 7(02)) for all 
(oi , 02) G R~^ , 



we have 



A,r]\=(p 



7 o ? 7 , 1 = rd{ip) 



for all first-order formulas p and environments rj. 



Proof. Since the universe \A\ is not empty, the guard {3x.{x ^ nil, nil)) holds 
for the state (s, h). So, it suffices to prove the following claim: for all first-order 
formulas p, 

A,v h (p 'y o r],h \= prd{p) 

It is straightforward to show the claim using induction over the structure of 

p. □ 



Before showing that validity is preserved by the translation, we note that when 
p is closed, so is rd{p)-, so, rd{p) is valid if and only if for all heaps h, there 
is some stack s with (s,h) |= rd{p). Let p he & closed first-order formula and 
let ft- be a heap. When ft does not have any cells containing {nil, nil), the guard 
{3x.{x ^ nil, nil)) of rd{p) always becomes false; consequently, (s,ft) ^ rd{p) 
for all stacks s. The key idea to handle the other case, where a heap ft has at 
least one cell containing {nil, nil), is to build a nonempty finite structure A and 
a stack s such that p holds in A if and only if (s, ft) \= rd{p). We construct 
such a stack simply by mapping all variables to the address of allocated cells 
in ft containing {nil, nil); then, the following lemma shows how to construct the 
needed structure. 

Lemma 2. For all heaps ft and stacks s such that h{s{x)) is defined and equal 
to {nil, nil) for all variables x, let A be a structure for the signature {i?} given 
by: 



— |.4| = {I € dom{h) I h{l) = {nil, nil)}; and 

- {h,k)&R^ tffhM are in |^| and h{l) = (^ 1 ,^ 2 ) for some I G dom{h). 



Then, A, s \= p iff s,h \= rd{p). 
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Proof. Note that the structure |^| can not be empty because in h, at least a 
single allocated heap cell must contain {nil, nil); otherwise, no stack would satisfy 
the condition in the lemma. One consequence of this fact is that (s, h) ^ rd{ip) 
iff (s, h) 1= prd{(fi). So, it suffices to show that 

-4, s h s,h\= prd{ip), 

which can be easily proved using induction over ip. □ 



4 Decidable Fragment 



The undecidability result in the previous section indicates that in order to obtain 
a decidable fragment of the assertion language, either quantifiers must be taken 
out in the fragment or they should be used in a restricted manner. In this section, 
we consider the quantifier-free fragment of the assertion language, including 
spatial connectives, emp, * and . The main result in the section is: 

Theorem 3. Deciding the validity of assertions is algorithmically decidable as 
long as the assertions are instances of the following grammar: 

Pv.= {E^E,E)\E = E\ false | P ^ P | emp | P * P | P 



To prove the theorem, we need to show that there is an algorithm which takes an 
assertion following the grammar in the theorem and answers whether the asser- 
tion holds for all states. The main observation is that each assertion determines 
a finite set of states so that if the assertion holds for all states in the set, it 
indeed holds for all the states. The proof proceeds in two steps: first we consider 
the case that an assertion P and a state s, h are given so that an algorithm is 
supposed to answer whether s,h \= P; then, we construct an algorithm which, 
given an assertion P, answers whether s,h \= P holds for all the states {s,h). 
In the remainder of the section, we assume that all the assertions follow the 
grammar given in Theorem 3. 

The problem of algorithmically deciding whether s,h \= P holds given P, s, h 
as inputs is not as straightforward as it seems because of : when P is of the 
form Rj the interpretation of s,h \= P involves quantification over all heaps, 
which might require to check infinite possibilities. So, the decidability proof is 
mainly for showing that there is a finite boundary algorithmically determined 
by Q and R. We first define the size of an assertion, which is used to give an 
algorithm to determine the boundary. 



Definition 2 (size of P). For an assertion P, we define size of P, \P\, as 
follows: 



|(P^Pl,P2)| = 1 

|falsej = 0 

\P*Q\ = \P\ + \Q\ 

|emp| = 1 



I El — E 2 1 — 0 

|P=t>Q| = max{\P\,\Q\) 

\P^Q\ = 101 
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The size of P determines a bound on the number of heap cells we have to 
consider to determine whether P is true or not. For instance, the size of (x e- >■ 
nil, nil) * {y nil, nil) is 2; and to decide whether it or its negation is true, or 
whether it is merely satisfiable, requires us only to look at heaps of size upto 2. 

The following proposition claims that there is a bound number of heaps to 
check in the interpretation of s, ft. ^ R; the decidability result is just an 
immediate corollary. Let ord be an effective enumeration of Loc. 

Proposition 1. Given a state (s, ft) and assertions Q,R, let X be FV{Q) U 
FV (R) and B a finite set consisting of the first max(|Q|, |i?|) locations in Loc — 
{dom{h) U s(X)) where the ordering is given by ord. Pick a value v G Val — 
s{X) — {nil}. Then, (s,h) ^ R holds iff for all hi such that 

— ft#fti and (s, fti) 1= Q; 

— dom{hi) Q B\J s(^); and 

— for all I G dom{hi), hi{l) G (s(^) U {nil, u}) x (s(X) U {nil, u}) 
we have that (s, ft * fti) |= R. 

To see why the proposition implies the decidability result, notice that there 
are only finitely many fti’s satisfying the conditions because both S U s(X) and 
s{X)\J {nil , u} are finite. Since all the other cases of P only involve finitely many 
ways to satisfy s, ft |= P, the exhaustive search gives the decision algorithm. 

The interesting direction of the proposition is “if” because the only-if di- 
rection follows from the interpretation of -* . Intuitively, the if direction of the 
proposition holds because the following three changes of heap cells do not af- 
fect the truth of either Q or R-. relocating “garbage” heap cells (those not in 
s(X)); de-allocating redundant garbage heap cells when there are more than 
max{\Q\, |i?|) of them; overwriting “uninteresting values” (those not in s(X) U 
{nil}) by another uninteresting value (v). Then, for every heap ft( with hffh'i, 
there is a sequence of such changes which transforms ft( and ft * h{ to hi and 
h* hi, respectively, such that fti satisfies the last two conditions in the propo- 
sition. The proposition follows because each step in the sequence preserves the 
truth of both Q and i?; so, (s, ft'i) \= Q implies (s, fti) ^ Q, and (s, fti * ft) \= R 
implies {s,h[* h) \= R. 

Corollary 1. Given a stack s and an assertion P, checking (s, h) )= P for all 
ft is decidable. 

Proof. The corollary holds because s,h \= P for all ft iff s, [] |= (->P)^! false. □ 

For the decidability of checking (s, ft) |= P for all states (s,ft), we observe that 
the actual values of variables are not relevant to the truth of an assertion as long 
as the “relationship” of the values remains the same. We define a relation 
to capture this “relationship” formally. Intuitively, two states are related by 
iff the relationship of the values, which are stored in variables in X or in heap 
cells, are the same in the two states. 
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Definition 3 (~jc). For states (s, h) and (s', h') and a subset X of Var, (s, h) 
{s',h') iff there exists a bisection r from Val to Val such that r{nil) = nil; 
r(s(x)) = s'{x) for all x G X; and (r x r){h{l)) = h'{r{l)) for all I G Loc. ^ 

Proposition 2. For all the states (s,h) and {s',h') and all assertions P such 
that (s, h) ^FV(P) ('S^ h'), if (s, h) (= P, then (s', h') |= P. 

Lemma 3. Given a state (s, h) and an assertion P, let B be the set consisting of 
the first \ FV (P)| locations in Loc, where the ordering is given by ord. Then, there 
exists a state (s', h') such that s'( Var — FV (P)) C {nil}; s'{FV (P)) C BL){nil}; 
and (s,h) mpviP) {s',h'). 

The decidability result follows from the above lemma. To see the reason, we note 
that because of the lemma, for all assertions P, there is a finite set of stacks such 
that if for all stacks s in the set and all heaps h, (s, h) ^ P, then P holds for all 
states whose stack is not necessarily in the set. Therefore, a decision algorithm 
is obtained by exhaustively checking for each stack s in the finite set whether 
(s, h) \= P holds for all heaps h using the algorithm in Corollary 1. 

Corollary 2. Given an assertion P, checking (s, h) \= P for all the states (s, h) 
is decidable. 

5 Complexity 

In this section we study the complexity of model checking for some fragments of 
the decidable logic of Section 4. 

We consider the following fragments, where (E yG —) means that E is not 
allocated (s, /i ^ (P yb — ) iff |P]s ^ dom{h)): 



Language 


MC 


VAL 


£ 


P ::= {E E, E) \ {E yG -) \ E = E \ E ^ E \ false 
1 P A P 1 P V P 1 emp 


P 


coNP 


£* 


P : 


:=£ 1 P*P 


NP 






P : 


:= £ -P 1 P * P 


PSPACE 


PSPACE 


C~* 


P : 


:= £ P-i=P 


PSPACE 


PSPACE 




P : 


:= £ -P 1 P * P 1 P-i= P 


PSPACE 


PSPACE 



Given a fragment the corresponding model-checking problem MC{C‘^) is 
deciding whether s,h \= P holds given a state (s, h) and an assertion P G CF . 
The validity problem asks whether a formula is true in all states. In the above 
table the second-last column reports the complexity of model checking and the 
last the complexity of validity. 

^ the equality iu (r x r){h{l)) = h'{r{l)) rueans that if oue side of the equatiou is 
defiued, the other side is also defiued aud they are equal. 
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The easy fragment is £. Clearly MC{C) can be solved in linear time by the 
obvious algorithm arising from the semantic definitions, and it is not difficult to 
show that the validity is coNP-complete. As soon as we add *, model checking 
bumps up to NP-complete. The validity problem for C* is 7T|’-complete; we show 
the former but consideration of the latter is omitted for brevity. It is possible 
to retain linear model checking when * is restricted so that one conjunct is of 
the form {E i— >■ Ei,E 2 ). The fragment is the object of the decidability 

result of Section 4; a consequence of our results there is that model checking and 
validity can be decided in polynomial space. Below we show PSPACE-hardness 
for model checking for the two fragments £“'* and . It is a short step to show 
PSPACE-hardness for validity. 

5.1 MC{C*) Is NP-Complete 

In this section we show directly that MC{C*) belongs to NP, and give a reduction 
from an NP-complete problem to it. 

Proposition 3. MC{C*) is in NP. 

Proof. The only interesting part is deciding whether s,h ^ P * Q holds. The 
algorithm proceeds by choosing non-deterministically a set D C dom{h), deter- 
mining a splitting of h in two heaps h\ and /12 obtained by restricting h to D 
and to dom{h) — D respectively. □ 

Definition 4. The problem SAT is, given a formula F from the grammar 

F ::= X \ -'X \ F A F \ F V F 

deciding whether it is satisfiable, i.e. whether there exists an assignment of 
boolean values to the free variables of F making F true. 

Definition 5. The translation from formulas F to assertions P of L* is defined 
by a function tr{—)-^ 

tr{x) = {x ^ nil, nil) * true tr(-'x) = (x yA- —) 

tr{Fi A F 2 ) = tr{Fi) A tr{F 2 ) tr{Fi V F 2 ) = tr{Fi) V tr{F 2 ) 

Proposition 4. A formula F with variables {xi, . . . , x„} is satisfiable if and 
only if sq, ho ^ tr{F) *true holds, where sq maps distinct variables Xi to distinct 
locations U, dom{ho) = In} and ho{li) = {nil, nil) for i = 1, . . . ,n. 

Proof. The truth of a boolean variable x is represented by its being allocated; in 
the initial state (sq) ^ 0 ) all the variables are allocated. The formula tr{F) * true 
is true if and only if there exists a subheap h' making tr{F) true, and subheaps 
correspond to assignments of boolean values to the variables in F. □ 

Since the translation and construction of (sq, ho) can be performed in polyno- 
mial time, an immediate consequence is NP-hardness of MC{C*), hence NP- 
completeness. 

true can be expressed by nil = nil in £*. 



2 
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5.2 MC{C-^*) Is P SPACE- Complete 

In this section PSPACE-hardness of MC{C^*) is proved by reducing a PSPACE- 
complete problem to it. Completeness follows from the fact that ) is 

in PSPACE and that £“'* is a sub-fragment of . 

Definition 6. The problem QSAT is, given a closed formula G from the gram- 
mar 

F ■.■= X \ -<x \ F f\ F \ F W F, G ::= Vxi.3?/i Va;„.3y„.E 

deciding whether it is true. 

Definition 7. The translation from formulas G to assertions P ofC^* is defined 
by a function tr{—): 

tr{x) = (x e-l nil, nil) * true tr{->x) = (x yb — ) 

tr{Fi A F 2 ) = tr{Fi) A tr{F 2 ) tr{Fi V F^) = tr{F{) V tr{F 2 ) 

tr{3yi.G) = {{yi ^ nil, nil) V emp) * tr{G) 
trifJxi-G) = -1 (((xi !->■ nil, nil) V emp) * ->tr{G)) 



Proposition 5. A closed formula G is true if and only if so,ho |= tr{G) holds, 
where sq maps distinct variables Xi to distinct locations U, dom{ho) = {h , . . . , l„} 
and ho{li) = (nil, nil) for i = 1, . . . ,n. 

Proof. The truth of a boolean variable x is represented by its being allocated; in 
the initial state (sq, ho) all the variables are allocated. The only interesting cases 
are the quantifiers. The invariant is that trfByi.G) is checked in a state where 
yi is allocated, thus ((j/i i-l nil, nil) V emp) * tr{G) holds iff tr{G) holds either for 
the current state or for the state obtained by de-allocating yi. In other words, G 
either holds for yi true or for yi false. The translation of (Vxj. — ) is essentially 
-'(3xi.-'— ). □ 

Observing that the translation and construction of (sq: ^ 0 ) can be performed in 
polynomial time, we have shown PSPACE-hardness of MG{£^*). 

5.3 MC{C~* ) Is PSPACE-Complete 

In analogy with the previous section, a translation from QSAT to MG{C^ ) is 
presented. 

This case is more complicated, since -* provides a natural way of representing 
universal quantifiers, but there is no immediate way to represent existentials. Our 
solution is to use two variables Xt and Xf to represent a boolean variable x. There 
are three admissible states: 

— initial, when neither x* nor x/ is allocated; 
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— true, when Xt is allocated and Xf is not; 

— false, when Xf is allocated and Xt is not. 

We use some auxiliary predicates: 

{x ^ —) = ((a; !->■ nil, nil)^= false) A (x yf nil) 
ix = (xt -) /\ {xf ^ -) 

OK^ = {{xt ^ -) A {xf -)) V {{xf ^ -) A (xt iA -)) 

The meaning of (x '^ — ) is that it is not possible to extend the current heap 
with X pointing to (nil, nil), i.e. x is allocated; means that x is in an initial 
state, and OK^ means that x is either in state true or in state false. 

Definition 8. Given a closed formula Vxi.dyi \/xn-^yn-F , define the or- 

dered set V = {xi <1/1 < . . . < Xn < Un}- Write S°p for V — S when S QV . 
Define {< x} = {x' € V\x' < x}. The predicates are extended as follows: 

Is = f\Ix OKs = f\ OK^ 

x^S x^S 

The translation is defined by a function tr{—): 

tr{x) = (xt ^ -) 
tr{-ix) = {xf ^ — ) 
tr{Fi A F2) = tr{Fi) A tr{F2) 
tr{Fi V F2) = tr{Fi) V tr{F2) 
tr{Mxi.G) = A /{a;.}op)^=tr(G) 

tr{3yi.G) = - A/{>y.})A Atr{G))) 

where is short for P-* false. 

Intuitively, the translation of x says that x is in state true, and the translation 
of -ix says that x is in state false. For (Vxi. G), the invariant is that OK^sy._^^ 
and I>xi hold, and the translation says that G holds after extending the current 
heap with any new heap containing only Xj in an OK state (i.e. true or false). 
For (3yi. G), the invariant is that OK^^Xi} and I>y^ hold. The formula ~ (P A 
Q) implies that when P holds in a new heap, the heap does not satisfy Q; in 
particular, if P is OK^sxi} A !{>yi} and P holds in the current heap, ~ (P A Q) 
implies that inverting the boolean value of x makes Q false. This case is the 
most complicated of the translation, and involves a double negation. In words, 
it says that given an initial heap ho, inverting the boolean values of variables 
in {< Xi} leads to a heap hi which makes the following false: for every heap /12 
obtained from hi by inverting again the boolean values of variables in {< Xj} 
and by assigning some boolean value to yi, G does not hold in /12. 

Proposition 6. A closed formula G is true if and only if So> D H '^x{G) holds, 
where sq maps distinct variables Xi to distinct locations k, and [] is the empty 
heap. 
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To obtain the PSPACE-hardness result, observe that the translation can be 
performed in polynomial time, since the size of each 1$ and OKs is linear in the 
number of variables. 

6 Future Work 

Possible directions for future work include incorporating heap variables that 
allow us to take snapshots of the heap, with suitable restrictions [6] to maintain 
decidability, and also recursive definitions or special atomic predicates [1] for 
describing paths through the heap. We also plan to investigate the relation of 
our approach to work on model checking mobile ambients [2]. Finally, it would 
be useful to integrate our results on counter-models with the recently developed 
tableaux proof theory for Bunched Implications [4]. 
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Abstract. In this paper, we illustrate how nondeterminism can be used 
conveniently and effectively in designing efficient deterministic algorithms. 
In particular, our method gives an 0((5.7fc)*^n) parameterized algorithm 
for the 3-D matching problem, which significantly improves the previous 
algorithm by Downey, Fellows, and Koblitz. The algorithm can be gen- 
eralized to yield an improved algorithm for the r-D matching problem 
for any positive integer r. The method can also be employed in designing 
deterministic algorithms for other optimization problems as well. 



1 Introduction 

Nondeterminism has been a central topic in the study of complexity theory which 
lead to the famous “P ^ NP” problem. In this paper, we study nondetermin- 
ism from an “algorithmic” point of view, and demonstrate how nondeterminism 
can be used conveniently and effectively to design efficient deterministic algo- 
rithms. We illustrate in detail our techniques by studying the complexity of the 
parameterized 3-D matching problem. 

The 3-dimensional matching problem, abbreviated 3-D matching problem, is 
one of the six “basic” NP-complete problems according to Garey and Johnson 
[5]. Recently, Downey, Fellows, and Koblitz were the first to show that the r-D 
matching problem is fixed-parameter tractable [4] . They presented a parameter- 
ized algorithm for the problem based on families of hash functions. Their algo- 
rithm runs in time 0((r/c)!(rA:)^”^“'"^nlog® n), where k is the size of the matching 
sought, and n the total number of tuples in the input set. 
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** Supported in part by UGC of Hong Kong under Grant 9040228. 
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The authors of [4] noted that it is possible to improve the running time of 
their algorithm by using better families of hash functions. Using these fami- 
lies of hash functions, the running time of their algorithm can be improved to 
0(2^^'~^\rky.nlog^ n). In particular, for the 3-D matching problem, the running 
time of their algorithm can be improved to 0(2‘^^^^(3fc)!nlog‘* n). 

We propose a completely different approach to develop an efficient deter- 
ministic parameterized algorithm for the 3-D matching problem. Our algo- 
rithm starts with a nondeterministic algorithm that solves the problem. The 
nondeterminism in the algorithm is then removed by a process called “de- 
nondetermination” . With a careful design of the nondeterministic algorithm 
and with a nontrivial implementation of the de-nondetermination, we are able 
to show that the parameterized 3-D matching problem can be solved by a de- 
terministic algorithm in time 0((5.7fc)^n). This is an improvement over the 
best algorithm in [4] by an enormous factor (greater than /c^^log^n). Our algo- 
rithm can also be generalized to handle the r-D matching problem. More pre- 
cisely, our method yields an algorithm of running time 0{\/r — l(k^ + rn){{{r- 
l)r-i^r-2)/(gr-2(^2 — 1)))^) for the r-D matching problem. This is again a sig- 
nificant improvement over the best algorithm by a factor greater than log"^ n. 
The method can also be employed in designing efficient algorithms for other op- 
timization problems as well, such as packing and covering problems. 



2 A Nondeterministic Algorithm 

Let S C X X Y X Z be a set of n (ordered) triples ti, t 2 , ■ ■ ■, tn of symbols in 
X yjY yj Z . Without loss of generality, we can assume that \X\ = |U| = \Z\ = 
n and that the symbol sets AT, Y, and Z are pairwisely disjoint. Therefore, 
each symbol uniquely determines its own dimension. For a subset M of S, we 
sometimes also say, without ambiguity, that a symbol a is in M if the symbol a 
is contained in a triple in M . 

A triple t\ conflicts with a triple t 2 if and fo agree on any dimension but 
yf ^ 2 - A matching M in S' is a subset of triples such that no two triples in 
M conflict with each other. The parameterized 3-D matching problem is to de- 
termine for a given pair (S, k) whether the set S of triples contains a matching 
of k triples. The problem is NP-complete if it is regarded as a general decision 
problem [5]. In the framework of fixed-parameter tractability theory [4], we as- 
sume that the parameter k is much smaller than the size n of the triple set 
S. Therefore, an algorithm solving the problem in time 0{f{k)n‘^), where / is 
a function of the parameter k but independent of the input size n and c is a 
constant independent of the parameter k, is preferable to the straightforward 
algorithm of time 0{kn^), which simply examines all subsets of size k in the 
triple set S. 

In this section, we present a nondeterministic algorithm for the parameterized 
3-D matching problem. We first assume that the triple set S has a matching Mk 
of k triples and discuss how such a matching can be found. 
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A partial triple t is a triple in which some of the symbols may be marked 
by an “unknown symbol *” . Note that each triple in the set S is also a partial 
triple. A partial triple is incomplete if at least one of its symbols is a * and a 
partial triple is a complete triple if it does not contain the symbol *. We first 
introduce the concepts of consistency and confliction of partial triples. 

Definition 1. Two partial triples t = (01,02,03) and t' = (01,02,03) are con- 
sistent if for each i, i = 1, 2, 3, either o^ = o' or one of Oi and o' (or both) is *. 
Two partial triples conflict with each other if they are not consistent but agree 
on a dimension. 

A set P of partial triples is called a partial matching if there is a one-to- 
one mapping from P to a matching M in S such that each partial triple in P is 
consistent with the corresponding triple in M . In this case, we say that the partial 
matching P is consistent with the matching M . According to the definition, a 
partial matching can always be obtained from a consistent matching in S by 
replacing certain symbols by the symbol *. 

The basic idea of our nondeterministic algorithm is to start with a partial 
matching Pk, which consists of k partial triples of the form (*,*,*), then to 
decide which symbol should replace each * symbol in the partial matching, thus 
obtaining a matching of k triples in S. We first show how to obtain the first 
non-* symbol for each partial triple in Pi^. 

A matching Mq in S is maximal if every triple in S' — Mq conflicts with at 
least one triple in Mq. A maximal matching Mg in S can be easily constructed in 
linear time as follows. Start from an empty matching Mq and mark each symbol 
as “unused” . Now go through the set S. For each triple t in S in which no symbol 
is used, add t to the matching Mq and mark the symbols in t as “used”. The 
resulting matching Mq is clearly a maximal matching in S. We shall see in the 
discussion below how maximal matchings prove to be very helpful in constructing 
a matching M^ of k triples. First we prove the following lemma. 

Lemma 1. Let Mq be a maximal matching in S and Mk be a matching of k 
triples in S. Then every triple in Mk has at least one symbol in Mq. 

Proof. If a triple t in Mk has no symbol contained in Mq, then t does not conflict 
with any triples in Mq. This contradicts the assumption that Mq is a maximal 
matching in A. □ 

We call {Mk, {ai, . . . , 0 ^}) a feasible pair if Mk is a matching of k triples in 
S, and {oi, . . . , Ok} is a set of symbols of S, such that each triple in Mk contains 
exactly one symbol in the set {oi, . . . , Ofe}. 

Lemma 2. Suppose that the triple set S has a matching of k triples. Let Mq 
be a maximal matching of less than k triples in S. Then there is a feasible pair 
{Mk, {oi, . . . , Ofc}) such that all symbols in {oi, . . . , Uk} are in Mq and each triple 
in Mq contains at least one symbol in the set {oi, . . . ,Ok}. 
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Proof. Suppose the maximal matching Mq contains k' < k triples. Let Mmax be 
a maximum matching in the triple set S. Since S has a matching of k triples, 
the matching Mmax contains at least k triples. Construct a bipartite graph G = 
(Vi U V 2 , E) as follows. Each triple in the maximal matching Mq is a vertex in 
Vi and each triple in the maximum matching M^ax is a vertex in V 2 (thus if a 
triple t is in both Mq and Mmax, then t makes two vertices in G). There is an 
edge in E from a vertex t in Vi to a vertex t' in V 2 if the triple t in Mq and 
the triple t' in Mmax have a symbol in common. Let Q be a maximum graph 
matching in the graph G. 

We first show that the maximum graph matching Q contains k' edges. In 
fact, if Q contains less than k' edges, then by Hall’s Theorem (see for example 
[1], page 72, Theorem 5.2), there is a subset V in V\ such that the number of 
vertices in the set N{V') is strictly smaller than the number of vertices in V' , 
where N{V') is the set of vertices in V 2 that are adjacent to any vertex in V' . 
Now since the vertices in V' are not adjacent to any vertices in V 2 — N{V'), the 
set V 2 — N{V') + V would form a matching in S that contains more triples than 
the matching Mmax- This contradicts the assumption that M^ax is a maximum 
matching in S. 

Therefore, the maximum graph matching Q contains k' edges. Let t\, . . ., 
ty be the k' vertices in Vi and t[, . . ., tj,, be the k' vertices in V 2 such that 
is an edge in Q for i = l,...,k'. Let Oi be a common symbol in the 
triples ti and t' for i = l,...,k'. Pick any k — k' triples from 

fWmax — , t'k'} (recall that Mmax contains at least k triples). The k triples 

t'l, . . ., t'f.,, tfe/_|_i, ■ ■ ■, t'f. make a matching Mk of k triples in S. By Lemma 1, each 
of the triples t'p j = k' + 1, ... ,k contains a symbol Oj in the maximal matching 
Mq. Now (M^, {oi, . . . , a^'+i, . . . , Ofc}) is a feasible pair where all symbols in 

{oi, . . . , Ofc', Ofc'+i, . . . , Ok} are in the maximal matching Mq, and each triple U in 
Mq = {ti, . . . ,tk’} contains the symbol at in the set {oi , . . . ,ak>, Ufe'+i, . • . , afe}, 
for i = 1, . . . , A:'. □ 



Call a set Yfc of k symbols in Mq a spanning set if each triple in Mq contains 
at least one symbol in 1^. According to Lemma 2, if S contains a matching M^ 
of k triples, then there exists a spanning set Yfc of Mq such that (Mk,Yk) is 
a feasible pair. Now here is the first step to construct the matching Mk of k 
triples in S. First we construct a maximal matching Mq in S. Suppose that the 
maximal matching Mq contains k' triples. By our assumption, S has a matching 
Mk of k triples. Thus, we must have k' > k/i (see, for example, [8] for a proof 
of this simple fact). If k' > k, then we simply pick any k triples in Mq to make 
a matching of k triples. On the other hand, if fc/3 < k' < k, then according to 
Lemma 2, there is a partial matching Pk of k partial triples, consistent with the 
matching Mk, and a spanning set Yk of Mq, such that each partial triple in Pk 
contains one symbol in Yk and two *’s. To construct the partial matching Pk, we 
start with a set of k empty triples (i.e., triples containing * symbols only), then 
we nondeterministically guess a spanning set Yk of Mq, and for every triple in 
Pk we replace a * symbol with a disticnt symbol in Yk. (note that each guessed 
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symbol uniquely determines its own dimension in a partial triple in Pk). We will 
call the partial matching Pk the starting partial matching. 

Beginning with the starting partial matching, inductively we assume that we 
have obtained a partial matching P^. of k partial triples in which each partial 
triple contains at most two *’s. In order to construct the matching Mk by filling 
in the unknown symbols * in the partial matching Pk, we try to replace each 
incomplete partial triple t' in Pk by a consistent triple t in the set S. We say that 
the partial triple t' in Pk is replaceable by a triple t in S' if t does not conflict with 
any partial triple in Pk and t' is the unique partial triple in Pk that is consistent 
with t. We repeat this process of substituting replaceable partial triples in Pk by 
triples in S until there are no more replaceable partial triples in Pk, using the 
greedy algorithm Greedy-Filling given in Figure 1. 



Algorithm. Greedy-Filling 

INPUT: a partial matching Pk of k partial triples 

OUTPUT: a matching M in S 



1. Qk ~ Pk ; 

2. for each triple t in the set S do 

if a partial triple t' in Qk is replaceable by t 
then Qk = Qk - {t'} + {t}; 

3. let M be the set of complete triples in Qk 



Fig. 1. The algorithm Greedy-Filling 



The running time of the algorithm Greedy-Filling can be bounded by 
0{n) if for each symbol, we keep a mark as “unused” or “used”, and for each 
non-* symbol in Qk, we also attach to it the other non-* symbols in the same 
partial triple in Qk- 

There are two possible cases for the matching M constructed by the algorithm 
Greedy-Filling: either M contains k triples in S' or M contains fewer than k 
triples. The first case is simple: we have found a matching of k triples in S so 
we are done. Now we consider the second case. Suppose that the matching M 
consists oi k' < k triples. First note that if t is a complete triple in Pk, then t 
must be contained in the constructed matching M. 

By our assumption, the partial matching Pk is consistent with the matching 
Mk- According to the algorithm Greedy-Filling, the matching M is obtained 
from a subset P^ of k' partial triples in Pk by replacing the * symbols in Pi, by 
proper symbols. Thus, the matching M is consistent with the partial matching 
Pi- We say a symbol is a newly added symbol if it is in the matching M but not 
in the partial matching Pk- Note that the subset PI of Pk can be easily identifled 
from the matching M. Now since the partial matching Pk is consistent with the 
matching Mk, the partial matching Pi in Pk is also consistent with a subset Ml 
of k' triples in the matching Mk- Since k' < k, the set Mk — Ml is not empty. 
Let t be any triple in the set Mk — Ml- The triple t is consistent with a partial 
triple t' in Pk — Pi- Thus, the triple t is not contained in the matching M. By 
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the algorithm Greedy-Filling, the triple t conflicts with at least one triple t" 
in M. Suppose that the triples t and t" share the same symbol a. 

We claim that the symbol a is a newly added symbol in the matching M. 
In fact, if a is in the partial matching then since a is in the matching M 
which is consistent with the partial matching Pi., the symbol a is also in P^. 
In consequence, the symbol a is contained in a triple in the subset M^. This 
contradicts the assumption that Mk is a matching since the triple t in the set 
Mk — M'f. also contains the symbol a. This proves that the symbol a is a newly 
added symbol in the matching M. By this reasoning, for each partial triple t' 
in Pfc — Pfc, which is consistent with a triple t in Mk — M'^, one of the symbols 
in t that corresponds to a * in t' is contained in the matching M . Moreover, 
this symbol must be a newly added symbol in M. Therefore, to decide which 
symbol should replace a * in we nondeterministically guess a newly added 
symbol in M and replace the * in t' by this guessed symbol. Note that this 
process has reduced the number of * symbols in P^ by 1. Now on this new 
partial matching Pk with one fewer *, we apply the algorithm Greedy-Filling 
again. By repeating this process at most 2k times, each time either we construct 
a matching of k triples in S or reduce the number of * symbols in P^ by 1, we can 
eliminate all * symbols in the partial matching Pk, thus obtaining a matching 
of k triples in S. 

Can we do better than iterating this process 2k times? We show next that 
it suffices to iterate this process k times. We start by showing that if each 
partial triple in Pk contains at most one * symbol, then the matching Mk can 
be constructed easily. 

Lemma 3. Suppose that P is a partial matching of h partial triples such that 
each partial triple in P contains at most one *. Then a (regular) matching in S 
consistent with the partial matching P can he constructed in time 0{h^ + n). 

Proof. By the definition of partial matchings, there is a one-to-one mapping from 
the partial matching P to a matching M of h triples in S such that each partial 
triple in P is consistent with the corresponding triple in M. Let P = Pq U Pi, 
where Pq is the set of partial triples in P that contain no * symbols and Pi is the 
set of partial triples in P that contain exactly one * symbol. By the definition, 
the matching M can also be partitioned into M = Mg U Mi, where Mg = Pg 
and Ml is a matching consistent with the partial matching Pi. Suppose that 
the partial matching Pi consists oi h' <h partial triples. We only need to show 
how a matching M' of h' triples can be constructed from the partial matching 
Pi such that M' U Pg is a matching of h triples in S. 

Construct a bipartite graph G from the partial matching Pi as follows. The 
graph G contains two kinds of vertices: each partial triple in Pi is a vertex in G, 
and each non-* symbol a not in P is a vertex in G if there is a partial triple t in 
Pi such that the two non-* symbols in t and the symbol a make a triple in S. The 
vertices in G corresponding to partial triples in Pi will be called triple-vertices 
and the vertices in G corresponding to symbols will be called symbol-vertices. 
There is an edge in G from a triple-vertex t in Pi to a symbol-vertex a if the 
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two non-* symbols in t and the symbol a make a triple in S. Note that there are 
exactly h' triple- vertices. 

The graph G has a graph matching of h' edges: for example the matching Mi 
of h' triples in S consistent with the partial matching Pi corresponds to a graph 
matching of h' edges in G. Moreover, every maximum graph matching, which 
consists of exactly h' edges, corresponds to a matching M' of h' triples in S 
that is consistent with the partial matching Pi such that M' U Pq is a matching 
of h triples in S (note that all symbol-vertices of the graph G correspond to 
symbols not in P). Therefore, a matching of h triples in S can be constructed 
by constructing a maximum graph matching in the graph G. 

If a triple-vertex t in the graph G is incident on more than h' edges, then we 
can first construct a maximum graph matching (of ft.' — 1 edges) in the graph 
G — {t} then add one more edge incident to t to make a graph matching of ft' 
edges in the graph G. Therefore, after a time 0{n) preprocessing, we can assume, 
without loss of generality, that each triple-vertex in the graph G is incident on 
at most ft' edges. Consequently, the graph G contains at most (ft')^ edges, and 
at most (ft')^ -I- ft' vertices (ft' triple-vertices and at most (ft')^ symbol- vertices) . 
Using Hopcroft and Karp’s matching algorithm for bipartite graphs [7], which 
runs in time 0{rriy/n) on graphs of n vertices and m edges, we conclude that a 
maximum graph matching in G can be constructed in time 0((ft')^) = 0{h^). 
As a result, a matching of ft triples in the triple set S can be constructed in time 
0{h^ + n). □ 

Therefore, beginning with the starting partial matching P^, in which each 
partial triple contains exactly two *’s, we only need to replace one * symbol 
in each partial triple in Pj,. Once every partial triple in the partial matching 
Pfc contains exactly one * symbol, we can apply Lemma 3 to construct directly 
a matching of k triples in S. This method will reduce the number of guessed 
symbols from 2k to k. 

To make this method possible, we do the following. Given the partial match- 
ing Pfc of k partial triples, let Pi be the set of ft partial triples in Pfc that contain 
at most one * symbol and let T 2 be the set of ft' triples in Pfc that contain two * 
symbols, ft = ft -I- ft'. Now instead of applying the subroutine Greedy-Filling 
on Pfc directly to construct the matching M, we first construct a matching Mi 
of ft triples from the partial matching Pi, then apply the subroutine Greedy- 
Filling on the partial matching P 2 to construct a matching M 2 of at most ft' 
triples such that M = Mi U M 2 is a matching. Now note that every partial triple 
t in Pfc that is not consistent with a triple in M must be a triple in P 2 , which has 
two * symbols. Thus, guessing a newly added symbol in M to replace a * symbol 
in t will make t a triple with exactly one * symbol. Using this method, we 
can avoid guessing any symbols for the third * symbol in any partial triple in Pfc . 

We summarize all the above discussions in the algorithm given in Figure 2. 

Theorem 1. The algorithm Nondet-3D Matching solves the parameterized 
3-D matching problem if it makes all correct guesses. 
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Algorithm. Nondet-3D Matching 
INPUT: the set S of n triples and an integer k 

1. construct a maximal matching Mq in S'; 

2. case 1. |Mo| > k 

output any k triples in Mq; stop. 

3. case 2. |Mo| < fc/3 

stop: S does not contain a matching of k triples. 

4. case 3. fe/3 < |Mo| < k 

for each spanning set Yk of k symbols in Mq 

4.1. construct from Yk a partial matching Pk of k partial triples, 

each containing exactly two *’s; 

4.2. loop k times 

let Pk = T 1 VJT 2 , where Ti is the set of h partial triple s in Pk that 
contain at most one *, and T 2 is the set of h' partial triples in 
Pk that contain exactly two *’s, k = h + h'; 
construct a matching Mi of h triples consistent with Ti; 
let S' be the triple set obtained from S by deleting triples 
that contain symbols in Mi; 

call Greedy-Filling to construct a matching M 2 from T 2 and S'\ 
M = MiVJ M 2 ; 
if |M| = k 

then output M; stop, 
else 

if M contains no newly added symbols 

then stop: S does not contain a matching of k triples 

else 

pick any triple t in Pk that is not consistent with a triple 
in M; 

guess a newly added symbol in M to replace a * in t 

4.3. stop: S does not contain a matching of k triples. 

Fig. 2. The nondeterministic algorithm for 3-D matching 



Proof. According to the above analysis, if the triple set S contains a matching 
Mk of k triples and if the algorithm makes all its guesses correctly, then the 
algorithm will eventually end up with the matching Mk of k triples in S. 

On the other hand, suppose that the triple set S has no matching of k triples. 
Then since the algorithm Nondet-3D Matching concludes the existence of a 
matching of k triples only when it actually constructs such a matching, the algo- 
rithm will never make an incorrect conclusion if such a matching does not exist. 
In fact, in this case either the algorithm finds out that the maximal matching 
Mq constructed in step 1 contains less than fc/3 triples, or finds out in step 4.2 
that no new symbols can be added to the partial matching Pk- Once either of 
these conditions is realized, the algorithm correctly concludes the nonexistence 
of the matching of k triples in S. □ 
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3 De-nondetermination 

The de-nondetermination process is to remove nondeterminism in a given non- 
deterministic algorithm and convert it into a deterministic algorithm. One way 
to do this is to count the number of binary bits guessed in the algorithm, then 
enumerate and check all possible combinations of these binary bits. However, 
this blind enumeration approach might be costly. In the following we show 
that a more careful implementation of the de-nondetermination process can re- 
sult in a more efficient deterministic algorithm for the problem than the one 
resulting from blind enumeration. We adopt two techniques for efficient de- 
nondeterminations: the first one is to use more careful enumeration of valid 
situations instead of simply counting the number of guessed binary bits, and the 
second is to perform more thorough combinatorial analysis to reduce the search 
space so that fewer guessed binary bits will be needed. 

First consider the computational complexity of the algorithm Nondet-3D 
Matching. Suppose that the triple set S has n triples. As explained in section 
2, the maximal matching Mq can be constructed in time 0(n), and each ex- 
ecution of the algorithm Greedy-Filling takes time 0(n). Since the loop in 
step 4.2. is executed at most k times, we conclude that the running time of the 
nondeterministic algorithm Nondet-3D Matching is bounded by 0{kn). We 
analyse next the deterministic running time of the algorithm. 

Since steps 1, 2, and 3 of the algorithm Nondet-3D Matching take 0{n) 
deterministic time, it suffices to analyse the time taken to de-nondeterminize step 

4 of the algorithm. Consider step 4.1., we count how many spanning sets there 
are and how they can be enumerated. Let D(k' , k) be the number of spanning 
sets of k symbols in the matching Mq of k' triples. The function D{k' ,k) satisfies 
the following recurrence relation: 

D{k', k) = 3D{k' - 1, A: - 1) -k 3D{k' -l,k-2)+ D{k' - 1, fc - 3) (1) 

with the boundary conditions D(k',k) = 0 for k' > k or k' < k/2>. This recur- 
rence relation can be seen as follows. Fix any triple t in Mq and let Yfe be a 
spanning set of k symbols in Mq. Suppose that the triple t contains exactly one 
symbol a in the spanning set Y^, then (recursively) the matching Mq — {t} has 
D{k' — 1,/c — 1) spanning sets of A: — 1 symbols, each of which together with 
the symbol a makes a spanning set for Mq. Moreover, there are three different 
ways (one for each dimension) for the triple t to contain exactly one symbol in 
a spanning set. Therefore, totally there are 3D{k' — 1, A: — 1) spanning sets of k 
symbols in Mq such that the triple t contains exactly one symbol in the spanning 
sets. Similarly, there are 3D{k' —l,k — 2) spanning sets of k symbols in Mq such 
that the triple t contains exactly two symbols in the spanning sets, and there are 
D{k' — 1, A; — 3) spanning sets of k symbols in Mq such that the triple t contains 
exactly three symbols in the spanning sets. This gives the recurrence relation in 

(I). 

It is easy to verify that D{k' , k) < Oq, where Oq = 3.8473 • • • is the positive 
root for the equation — 3x^ — 3x — 1 = 0. Moreover, the spanning sets of k 
symbols in Mq can be systematically and recursively enumerated according to the 
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recurrence relation (1). We conclude that step 4.1 of the algorithm Nondet-3D 
Matching can be implemented by a deterministic loop whose body is executed 
at most Oq < 3.85^ times. 

Now we consider step 4.2., in the first execution of the loop in step 4.2, the 
partial matching Pk has exactly 2k * symbols. Therefore, the number of newly 
added symbols in the matching M is at most 2{k — 1) (note the matching M 
has less than k triples and M is consistent with a subset of P^). Thus, there are 
at most 2{k — 1) ways to pick a symbol in M to replace a * in a triple t in Pk- 
In general, in the ith execution of the loop in step 4.2, the partial matching 
contains 2k — i + 1 * symbols, and the number of newly added symbols in the 
matching M is at most 2(fc — 1) — z + 1 (again note that the matching M has 
less than k triples and any triple in Pk that is not consistent with a triple in 
M has two * symbols). Thus, in the zth execution of the loop in step 4.2, we 
have at most 2(fc— 1) — z + 1 different ways to pick a newly added symbol in M 
to replace a * symbol in a triple in Pk that is not consistent with any triple in 
M. Based on the above discussion, the loop in step 4.2 can be implemented as 
following. For each execution of the loop in step 4.2, we start with a sequence of 
positive integers 

did ,2 ■ ■ ■ dk 1 < di < 2{k — 1) — z + 1, for 1 < z < A: (2) 



such that in the zth execution of the loop body in step 4.2, we pick the dzth 
newly added symbol in the matching M (if the matching M has less than di 
newly added symbols, we simply skip this sequence). The de-nondetermination 
of the last line in step 4.2 can be implemented by executing the loop in step 4.2 
based on every possible sequence of positive integers of form (2). 

Therefore, step 4.2 of the algorithm Nondet-3D Matching can be de- 
nondeterminated into a deterministic step by a for loop on all possible sequences 
of positive integers of form (2). On each execution of the for loop, the loop in 
step 4.2 is executed. In consequence, the loop body in step 4.2 of the algorithm 
is executed at most 

fc-[2(fc-l)]-[2(fc-l)-l]---[2(fc-l)-fc+l]= 



times. Within each execution of the loop body, the sets Ti and T 2 can be trivially 
constructed from the partial matching Pk in time 0{k). According to Lemma 3, 
the matching Mi of h triples can be constructed from Ti in time 0{h^ + n) = 
0{k^ + n). The subroutine Greedy-Filling takes time 0(zz). In conclusion, in 
the corresponding deterministic algorithm, each execution of the step 4.2 takes 
time bounded by 



0{{k^ + n)- 



k{2k-2y 
(fc-2)! ^ 



{k^ + kn){2k-2)l 

^ {k-2)l ^ 



Again using Stirling’s approximation, we get 



(2/C-2)! 

(fc-2)! 



< 



(2fc)! 

fc! 



< 2(4fc/e)'= 



where e = = 2.7182818 • • • is the natural logarithmic base. 
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Combining this with the analysis of the main for loop in step 4, we conclude 
that the running time of the deterministic algorithm, obtained from the de- 
nondetermination of the algorithm Nondet-3D Matching, is bounded by 

0(3.85'=(fc^ -h kn){Ak/ef) = 0{{k^ + kn){\bAk/ef) = 0{{b.lkfn) 

The last equality is obtained from the fact that 15.4/e < 5.67 so that we have 
both fc^(15.4fc/e)^ and fc(15.4fc/e)^ are of order 0((5.7fc)^). We conclude with 
the following theorem: 

Theorem 2. The parameterized 3-D matching problem can he solved by a de- 
terministic algorithm of running time bounded by 0{{b.lk)^n) . 

4 Generalization and Concluding Remarks 

The algorithm in Figure 2 for the 3-D matching problem can be generalized 
in a straightforward manner to give an algorithm for the general r-D matching 
problem. We have the following theorem: 

Theorem 3. The parameterized r-D matching problem can he solved by 
a deterministic algorithm of running time 0(^r — l(k^ -\- rn)(((r — 
l)’'-ifc’'-2)/(e’'-2(//2- 1)))'=). 

The reader is refered to [3] for the detailed algorithm and its proof of cor- 
rectness. 

We have demonstrated, using the parameterized 3-D matching and r-D 
matching problems as examples, how nondeterminism can be used to develop 
efficient deterministic algorithms. Our algorithm improves the previous best al- 
gorithm by Downey, Fellows, and Koblitz [4], by a factor greater than k^^ log"* n. 
In particular, our algorithm for the 3-D matching problem has running time 
0{{5.7k)^n), which is significantly faster than the algorithm in [4] of running 
time nlog^n). The techniques presented in the current paper 

are also applicable to other parameterized optimization problems as well. For 
example, it is not hard to see that the algorithms presented in this paper can 
be modified to give algorithms for packing problems, such as the r-D pack- 
ing problem [5]. Moreover, these techniques can be adopted in solving covering 
problems, like the vertex cover problem [2] , which leads to a simpler and shorter 
presentation of the solutions. 

One might argue that the techniques presented here are essentially exhaustive 
search techniques, like the standard techniques used in parameterized algorithms 
[4], for which we could not disagree. However, we must indicate that conceptu- 
ally our techniques provide a more natural and intuitive method for developing 
efficient deterministic algorithms for certain problems where applying direct ex- 
haustive search may be lengthy and confusing. 
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Abstract. We investigate the Presburger liveness problems for nondeterministic 
reversal-bounded multicounter machines with a free counter (NCMFs). We show 
the following: 

- The 3-Presburger-i.o. problem and the 3-Presburger-eventual problem are 
both decidable. So are their duals, the V-Presburger-almost-always problem 
and the V-Presburger-always problem. 

- The V-Presburger-i.o. problem and the V-Presburger-eventual problem are 
both undecidable. So are their duals, the 3-Presburger-almost-always problem 
and the 3-Presburger-always problem. 

These results can be used to formulate a weak form of Presburger linear temporal 
logic and develop its model-checking theories for NCMFs. They can also be 
combined with [12] to study the same set of liveness problems on an extended 
form of discrete timed automata containing, besides clocks, a number of reversal- 
bounded counters and a free counter. 



1 Introduction 

An infinite-state system can be obtained by augmenting a finite automaton with one 
or more unbounded storage devices. The devices can be, for instance, counters (unary 
stacks), pushdown stacks, queues, and/or Turing tapes. However, an infinite-state system 
can easily achieve Turing-completeness, e.g., when two counters are attached to a finite 
automaton (resulting in a “Minsky machine”). For these systems, even simple problems 
such as membership are undecidable. 

In the area of model-checking, the search for (efficient) techniques for verifying 
infinite-state systems has been an ongoing research effort. Much work has been devoted 
to investigating various restricted models of infinite-state systems that are amenable to 

* Supported in part by NSF grant IRI-9700370 

R. Hariharan, M. Mukund, and V. Vinay (Eds.): FSTTCS 2001, LNCS 2245, pp. 132-143, 2001. 
(c) Springer- Verlag Berlin Heidelberg 200 1 




Verification of Reversal-Bounded Multicounter Machines with a Free Counter 



133 



automatic verification. The work is motivated by the successes of “efficient” model- 
checking techniques for finite-state systems such as hardware devices and reactive sys- 
tems [ 20 ], and the need for developing practical techniques for deciding verification 
properties of infinite-state systems. 

The infinite-state models that have been investigated include timed automata [1], 
pushdown automata [3,14], various versions of counter machines [5,13,18], and various 
queue machines [2,4,16,17,21]. 

Counter machines are considered a natural model for specifying reactive systems 
containing integer variables. They have also been found to have a close relationship to 
other popular models of infinite-state systems, such as timed automata [1]. In [ 6 ], it was 
shown that, as far as binary reachability (the set of configuration pairs such that one can 
reach the other) is concerned, a timed automaton can be transformed into a particular 
type of counter machine without nested cycles [5]. In contrast to [ 6 ], timed automata 
(with discrete time) are mapped to counter machines with reversal-bounded counters in 
[ 8 ]. In the case of dense time, the same mapping applies using some pattern technique 
[7]. 

Thus, studying various restricted models of counter machines may help researchers 
to develop verification theories concerning infinite-state systems such as timed automata 
augmented with unbounded storage [ 8 ]. 

In this paper, we focus on a class of restricted counter machines, called nondeter- 
ministic reversal-bounded multicounter machines with a free counter (NCMFs). More 
precisely, an NCMF M is a nondeterministic finite automaton augmented with a finite 
number ofreversal-bounded counters (thus, in any computation, each counter can change 
mode from nondecreasing to nonincreasing and vice-versa at most r times for some given 
nonnegative integer r) and one free counter (which need not be reversal-bounded). A 
fundamental result is that the emptiness problem for languages accepted by NCMFs is 
decidable [15]. But here we do not use NCMFs as language recognizers; instead, we are 
interested in the behaviors they generate. So, unless otherwise specified, an NCMF has 
no input tape. Reversal-bounded counters are useful in verification of reactive systems. 
For instance, a reversal-bounded counter can be used to count the number of times a 
particular external event occurs in a reactive system - in this case, the counter is simply 
0-reversal-bounded, i.e., non-decreasing. Allowing a free counter, together with other 
reversal-bounded counters, makes the reactive system infinite-state. More application 
issues of NCMFs and the results in this paper can be found at the end of the paper. 

The study of safety properties and liveness properties of infinite-state systems is of 
great importance in the area of formal verification. Safety properties look at only finite 
(execution) paths; mostly they can be reduced to reachability problems. In [18], it was 
shown that the Presburger safety analysis problem is decidable for NCMFs and their 
generalizations. A typical example of a Presburger safety property that we might want 
to verify for an NCMF M with counters xi, X 2 and X 3 is the following: Starting from 
counter values satisfying X 1 — X 2 + 8 x 3 > 5, M can only reach counter values satisfying 
Xi + 2x2 — 4x3 < 8. 

In this paper, we systematically study a number of Presburger liveness problems 
for NCMFs. An example is a 3-Presburger-i.o. problem like: Given an NCMF M with 
counters xi, X2 and X3, does there exist an w-path (i.e., infinite execution path) p for 
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M such that xi + 2x2 — 4xs < 8 is satisfied on p infinitely often? The research 
presented in this paper is inspired by the recent work in [12] that investigates the same 
set of Presburger liveness problems for discrete timed automata. But the techniques 
we develop here are completely different from the ones in [12]. Clocks in a discrete 
timed automaton, when considered as counters, are synchronous. So, in some way, a 
discrete timed automaton can be treated as a reversal-bounded multicounter machine 
(an NCMF without the free counter) [8]. The ability of an NCMF to use a free counter 
makes the Presburger liveness proofs much more complicated. The main results of this 
paper show that the 3-Presburger-i.o. problem is decidable for NCMFs. This result 
leads us to conjecture that the 3-Presburger-i.o. problem is also decidable for (discrete 
timed) pushdown processes when the counts on individual stack symbols are part of the 
Presburger property being verified [8]. 

The paper is organized as follows. Section 2 introduces the main definitions. Section 
3 shows the decidability of the 3-Presburger-i.o. and 3-Presburger-eventual problems. 
Section 4 generalizes the proofs in [12] to show the undecidability of the V-Presburger- 
i.o. and the V-Presburger-eventual problems. Section 5 is a conclusion. 



2 Preliminaries 



Let X = {xq, • • • , Xfc} be a finite set of integer variables. A formula EQ<i<i^aiXi^b, 
where Ui and b are integers, is called an atomic linear constraint, if # is > or =. The 
formula is called an atomic mod-constraint, if # is =d for some d > 0. A linear- 
conjunction is a conjunction of a finite number of atomic linear constraints. A linear- 
mod-conjunction is a conjunction of a finite number of atomic linear constraints and 
atomic mod-constraints. It is well known that a Presburger formula [19] (first-order 
formula over integers with addition) can always be written as a disjunctive normal form 
of atomic linear constraints and atomic mod-constraints, i.e., a disjunction of linear- 
mod-conjunctions. A set P is Presburger-definable if there exists a Presburger formula 
F on X such that P is exactly the set of the solutions for X that make F true. It is well 
known that the class of Presburger-definable sets does not change if quantifications are 
allowed. Hence, when considering Presburger formulas, we will allow quantifiers over 
integer variables. A standard test on A is a Boolean combination of atomic tests in the 
form of x#c, where # denotes <,>,<,>, or =, c is an integer, x £ X. Let Tx be the 
set of all standard tests on X. 

A nondeterministic multicounter machine (NCM) M is a nondeterministic machine 
with a finite set of (control) states and a finite number of integer counters. Each counter 
can add 1, subtract 1, or stay unchanged. M can also test whether a counter is equal 
to, greater than, or less than an integer constant by performing a standard test. Without 
loss of generality, in this paper we consider M without event labels on transitions, since 
these labels can be built into the control states. 

Formally, a nondeterministic multicounter machine (NCM) M is a tuple {S, X, E) 
where S' is a finite set of (control) states, A is a finite set of integer counters, and 
E C S xTx X X S is a finite set of edges or transitions. Each edge 

(s, t, incr, s') denotes a transition from state s to state s' with t £ Tx being the test or 
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the enabling condition, incr € { — 1, 0, 1} 1^1 denotes the effect of the edge: each counter 
in X is incremented by the amount specified in vector incr. 

The semantics of NCMs is defined as follows. We use V to denote counter vectors 
(i.e., vectors of counter values). We use Vi to denote the value of counter Xi in V, for 
0 < i < |X|. A configuration {s, V) G S x is a pair of a control state s and a 
counter vector V. (s, V)^ m{s' , V) denotes a one-step transition from configuration 
(s, V) to configuration (s', V) satisfying the following conditions: 

- There is an edge (s, t, incr, s') in M connecting state s to state s', 

- The enabling condition of the edge is satisfied, that is, t{V) is true, 

- Each counter changes according to the edge, i.e., V' = V + incr. 

A path is a finite sequence 

(so,VO)---(s„,F") 

such that (si, V*) -Gm (si+i, for each 0 < i < n — 1. An cu-path is an infinite 

sequence (sq, V°) • • • (s„, V”) • • • such that each prefix (sq, V°) • • • (s„, V^) is a path. 
We write (s, V) {s', V) if the configuration (s, V) reaches the configuration 
(s', V) through a path in M. The binary relation is called binary reachability. 

It is well known that counter machines with two counters have an undecidable halting 
problem. Thus, in order to investigate any nontrivial decidable verification problems 
for NCMs, we have to restrict the behaviors of the counters. A counter is r-reversal- 
bounded if it changes mode between nondecreasing and nonincreasing at most r times. 
For instance, the following sequence of counter values: 

0,0, 1,1, 2,2, 3, 3,4, 4,3, 2, 1,1, 1,1, ••• 

exhibits only one counter reversal. M is reversal-bounded if each counter in M is r- 
reversal-bounded for some r. Mis a reversal-bounded NCM with a free counter (NCMF) 
if M has a number of reversal-bounded counters and an unrestricted counter (that need 
not be reversal-bounded). ^From now on, an NCM (NCMF) refers to a machine with 
reversal-bounded counters (and one free counter). We assume throughout that whenever 
we are given an NCM (NCMF), the reversal-bound r is also specified. 

A fundamental result for NCMFs is that the binary reachability is Presburger. This 
characterization is quite useful, since it is well known that the emptiness and the validity 
problems for Presburger formulas are decidable. 

Theorem 1. The binary reachability is effectively Presburger definable for a reversal- 
bounded nondeterministic multicounter machine with a free counter. [15,18] 

This fundamental result allows us to automatically verify a Presburger safety analysis 
problem for an NCMF M [8,18]: from configurations in I, M can only reach configura- 
tions in P, where I and P are Presburger definable sets of configurations. This problem 
is equivalent to -•3a3P{a G I A a A (3 G -•P), which, from Theorem 1, is 

Presburger and therefore decidable. 

In this paper, we systematically investigate Presburger liveness analysis problems 
for NCMFs by considering their w-paths. We follow the notations in [12]. Let M be an 
NCMF, / and P be two Presburger-definable sets of configurations, and p be an w-path 
(so, V°) • • • (sn, V”) • • • . We say that p starts from I if (sq, V^) G I. Define 
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- p is P-i.o. if P is satisfied infinitely often on the w-path, i.e., there are infinitely 
many n sueh that (s„, V”) G P. 

-pis P-always if for eaeh n, {sn, V”) G P. 

-pis P -eventual if there exists n sueh that (s„, V”) G P. 

-pis P-almost-always if there exists n sueh that for all n' > n, {sn ' , V" ) & P. 

The 3-Presburger-i.o. (resp. always, eventual and almost-always) problem for NCMF 
M is to deeide whether the following statement holds: 

there is an oj-path p starting from I that is P-i.o. (resp. P-always, P -eventual and 
P-almost-always ). 

The V-Presburger-i.o. (resp. always, eventual and almost-always) problem for NCMF 
M is to deeide whether the following statement holds: 

for every oj-path p, if p starts from I, then p is P-i.o. (resp. P-always, P-eventual 
and P-almost-always). 

We use X to denote the veetor of the fc -F 1 counters xq,xi, - ■ ■ ,Xk in M, with xq 
the free counter and with xi, - ■ ■ ,Xk the reversal-bounded counters. 

3 Decidable Results 

In this section, we show that both the 3-Presburger-i.o. problem and the 3-Presburger- 
eventual problem are decidable for NCMFs. 

3.1 The 3-Presburger-i.o. Problem Is Decidable 

The 3-Presburger-i.o. problem is to determine the existence of an w-path p (called a 
witness) (sq) • • • {sn, V"') • • • of an NCMF M such that p is P-i.o. with respect to 
I. Since P is a Presburger definable set of configurations, by definition, P{X, s) can 
be written in a disjunctive normal form, \/ PfX, s), where each Pi{X,s) is a linear- 
mod-conjunction of atomic linear constraints and atomic mod-constraints over counters 
and control states (control states are encoded as bounded integers) in M. Obviously, p 
is P-i.o. iff p is Pi-i.o. for some i. Therefore, without loss of generality, we assume P 
itself is a linear-mod-conjunction. 

There are only finitely many control states S = {si, • • • , Sm} in M. Therefore, 
P(X, s) can be written as Vj gs s = Si A P{X, §i). p is P-i.o. iff p is P(-, Sj)-i.o. 
on some control state sp. there are infinitely many n such that s„ = Si and P(V", s^). 
Therefore, the 3-Presburger-i.o. problem is reduced to the problem of deciding whether 
there exist a control state s and a witness p starting from I such that p = {sq, V^) ■ ■ ■ 
{sn, V") • • • is P-i.o. on s, where P is a linear-mod-conjunction on counters X only. 
Assume that P{X) is yy P“°‘^(X), where is a linear-conjunction 

over X and P™°‘^ is a mod-conjunction over X. The following lemma states that, as far 
as an infinite often property is concerned, P™°'^ can be eliminated by building “mod” 
into the control states of M. 

Lemma 1. Given M (an NCMF with counters X and with control states S), I (a 
Presburger-definable set of configurations of M), P (a linear-mod-conjunction over 
X ), and s (a control state in S ), we can effectively construct M' ( an NCMF with 
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counters X and with control states S'), I' (a Presburger-definable set of configurations 
of M' ), P' (a linear-conjunction over X), and s' (a control state in S'), such that the 
following two statements are equivalent: 

- In M, there exists a witness p starting from I such that p is P-i.o. on state s, 

— In M' , there exists a witness p' starting from I' such that p' is P'-i.o. on state s'. 

Because of Lemma 1, it suffices to investigate the existence of a P-i.o witness p on state 
s with P in the form of a linear-conjunction over m linear constraints: 

'y (1) 

0<i<k 

where stands for > or for 1 < j < m. We use A, and b to denote the coefficient 
matrix (m by fc + 1) of Uij, the column (m by 1) of comparisons and the column 
(m by 1) of numbers bj. Thus, P shown in (1) can be written as 

AA#b. (2) 

We say {k + l)-ary vector A is P-positive if A A > 0. From definition, an w-path p of 
(sO) V'°)- • • {sn, V”) • • • is a desired witness iff the following conditions are satisfied: 

(lO-l). p starts from /; i.e., /(sq, V°) holds, 

(10-2). There are infinitely many numbers ni, ■ ■ ■ ,rii, ■ ■ ■ (with 0 < ni < • • • < 

Hi <■■ ■) such that s„. = s and P{V"') for each i. 

The following lemma states that condition (10-2) can be strengthened: for each i, 
T(/"i+i _ is P-positive. 

Lemma 2. Let P be a linear conjunction as in (2). Let s be a state in an NCMF M. For 
any ui-path p of M, condition (10-2) is equivalent to the following condition: 

(10-2' ). There are infinitely many numbers ni, • • • , rij, • • • (with 0 < ni < 

■ ■ ■ < rii < ■ ■ ■) such that = s, P(V”*), and V"*+i — V"* is P-positive, 
for each i. 

Up to now, we have not used the condition that counters x\, - ■ ■ ,Xk are reversal- 
bounded and that counter xq is free. Let C be the largest absolute value of the inte- 
ger constants appearing in all the tests in M. The idea is that, on the w-path p, each 
reversal-bounded counter will eventually behave as a 0-reversal-bounded (i.e., either 
nondecreasing or nonincreasing) counter after the last reversal has been made. Once a 
reversal-bounded counter behaves as 0-reversal-bounded, it will either stay unchanged 
between —C and C forever, or move beyond C (or —C) and never come back. That 
is, there is Uq such that each reversal-bounded counter Xi, 1 < i < k, has one of the 
following 2C -F 3 modes'. 

(MDl-c) with —C < c < C. For all n > no, Vf = = c. That is, xi is 

always c that is between —C and C after no, 

(MD2). For all n > no, C < V” < That is, Xi is nondecreasing and 

always greater than C, 

(MD3). For all n > no, —C > V” > V"^^. That is, Xi is nonincreasing and 
always smaller than —C. 
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Let mode vector 6 G ({MDl-c : —C < c < C} U {MD2, MD3})^ assign to each 
reversal-bounded counter Xi a mode 9i. Each w-path p has a unique mode vector. Now 
we fix any mode vector 6. 

M can be effectively modified into an NCMF such that the reversal-bounded 
counters in behave according to the mode vector 6. An edge (s^, t, incr, s^) in 
M is compatible with a mode vector 0, if, for each reversal-bounded counter Xi with 
1 < i < fc, the following conditions hold: 

- If Xi is in mode MDl-c for some —C < c < C, Xi will not change on the edge; i.e., 

incri = 0 if =MDl-c, 

- If Xi is in mode MD2, Xi will not decrease on the edge; i.e., incri > 0 if 0^ = MD2, 

- If Xi is in mode MD3, Xi will not increase on the edge; i.e., incri < 0 if 0^ = MD3, 

The modification starts with deleting all the edges in M that are not compatible with 

0 from M. Then, more tests are added to the remaining edges to make sure that the 
reversal-bounded counters always have the desired values. More precisely, for each Xi 
with 1 < i < k and for each remaining edge incr, s^) in M, if Xi is in mode 
MD2, then we add a test of Xi > C to the original test t of the edge. Doing this will 
guarantee that the values of Xi before and after this edge are greater than C (no matter 
whether incr^ = 0 or incri = 1). The cases when Xi is in mode MD3 can be handled 
similarly. If, however, Xi is in mode MDl-c for some —C < c < C,we simply add a 
test of Xi = c to the original test t of the edge. The result is also an NCMF with 
0-reversal-bounded counters. 

Obviously, from the choice of constant C, is insensitive to the actual starting 
values of the 0-reversal-bounded counters. That is, if (1) (s^, V^) can reach (s^, + 

through a path Pi in M®, and (2) (s^, V^) can reach (s^, V'^ -F A^) through a path 
P 2 in M®, such that the free counter xq has the same value at the end of pi and at the 
beginning ofp 2 , i.e., Vj -F = Vg, then each 0-reversal-bounded counter Xi with 

1 < i < fc in p 2 can start from V\ + A\ (instead of Vf) and at the end ofp 2 , Xi has 
value V] + A] + Af (instead of Vf -F A‘f). Thus, path pi can be extended according 
to path p 2 . The reason is that after changing the starting value of Xi, the test of Xi on 
each edge on path p 2 gives the same truth value as the old starting value, and, hence, 
path p 2 can be perfectly followed after pi . This is summarized in the following technical 
lemma. 

Lemma 3. For any control states and s^, for any mode vector 6, for any (k+F)-ary 

vectors V^, V^, A^, and ifVh + A^ = Vq, (s^, V^) '^m» <^tid 

(s2, y2 ^ ^2^ then (s^, yi + A^) (s^ Vi + -F A^). 

Let /' = {/? : 3 q! € /(a '^m /3)}- I' is the set of reachable configurations from 
configurations in /. From Theorem 1, I' is Presburger. The following lemma states 
that the 3-Presburger i.o. problem of M can be reduced to one for 0-reversal-bounded 
NCMFs M®. 

Lemma 4. There exists a witness p in M starting from I that is P-i.o. at state s iff for 
some mode vector 6, there exists a witness p' in starting from /' that is P-i.o. at 
state s. 
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For any state s and mode vector 6, we define a predicate as follows. 

v') iff there exist two vectors V and A such that the following statements are 
satisfied: (Ql). wandw'arethevaluesofthefreecounter;i.e.,w = Vo?indv' = V'o+-^o; 
(Q2). Both V and V + .^ satisfy P; i.e., P(V) AP(V + .^); (Q3). Configuration (s, V) 
can reach configuration (s, V + .^) in M^; i.e., (s, V) (s, V + A)-, (Q4). A is 
P-positive; (Q5). Finally, configuration (s, V) is reachable from some configuration in 
/; i.e., (s,V) gP. 

Lemma 5. For any state s and mode vector 6, is Presburger. 

It is easy to check that Q'*’® is transitive. 

Lemma 6. For any state s and mode vector 6, Q®’® is transitive. That is, for all integers 
vi,V 2 , and vz, 1 ^ 2 ) A ( 5 ^P(w 2 , r's) implies Q^’^{vi,vz). 

Before we go any further, we need to uncover the intuitive meaning underlying 
the definition of Q®’®. (5® ®(u, v') indicates the following scenario. Through a path in 
M®, can send the free counter a;o from value v to v', with some properly chosen 
starting values for the 0-reversal-bounded counters ((Ql) and (Q3)). On the path, 
starts from control state s and finally moves back to the same control state, as given in 
(Q3). Therefore, this path is a loop on the control state s. It is noticed that the starting 
configuration and the ending configuration of the path both satisfy P (as given in (Q2)), 
and in particular, the counter changes A is P-positive (as given in (Q4)). 

If we can repeat the loop, then the resulting w-path is P-i.o. (this is because of Lemma 
2 and the fact that A is P-positive.) and, from (Q5), starts from I'. However, this loop 
may not repeat. The reason is that the starting value v of the free counter decides the path 
of the loop and therefore, when executes the loop for a second time, the starting 
value v' of the free counter may lead to a different path. Thus, trying to repeat the same 
loop is too naive. However, the key technique shown below attempts to concatenate 
infinitely many (different) loops into an w-path that is a P-i.o. witness. 

Let be an w-sequence of integers 



‘ ‘ ‘ 1 • 

is an uj-chain of if Q^’^{vn, Vn+i) holds for all n > 0. According to Lemma 6, 
Qs ,0 is transitive. Therefore, ifu“ is an w-chain then Q^’^{vn, Vm) holds for any n < m. 
The following lemma states that the existence of an w-chain for Q® ® is decidable. 

Lemma 7. It is decidable whether a transitive Presburger predicate over two variables 
has an co-chain. Thus, from Lemma 5 and Lemma 6, it is decidable whether Q®’® has an 
co-chain. 

We now show that the existence of a P-i.o witness p at state s starting from I is 
equivalent to the existence of an w-chain of Q®’® for some mode vector 6. 

Lemma 8. There is an co-path p that is P-i.o. at state s and starts from I iff, for some 
mode vector 0, has an co-chain. 



Finally, combining Lemma 7 and Lemma 8, we have. 
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Theorem 2. The 3-Presburger-i.o. problem is decidable for reversal-bounded multi- 
counter machines with a free counter. 

The 3-Presburger-i.o. problem is equivalent to the negation of the V-Presburger- 
almost-always problem. Thus, 

Theorem 3. The \/-Presburger-almost-always problem is decidable for reversal 
bounded multicounter machines with a free counter. 

3.2 The 3-Preshurger-Eventual Problem Is Decidable 

Given two Presburger-definable sets I and P of configurations for NCMF M, the 3- 
Presburger-eventual problem is to decide whether there exists a P-eventual w-path p 
starting from I. Recall that the Presburger-definable set /' is the set of all configurations 
in P that are reachable from a configuration in I. In the following lemma, true means 
the set of all configurations. It is easy to see that 

Lemma 9. There is a P-eventual uj-path starting from I iff there is a true-i.o. oj-path 
starting from I'. 

Hence, combining Lemma 9 and Theorem 2, we have, 

Theorem 4. The 3-Presburger-eventual problem is decidable for reversal-bounded 
multicounter machines with a free counter. 

Since the 3-Presburger-eventual problem is equivalent to the negation of the V- 
Presburger-always problem, we have, 

Theorem 5. The \/-Presburger-always problem is decidable for reversal-bounded muT 
ticounter machines with a free counter. 

The Presburger safety analysis problem is slightly different from the V-Presburger- 
always problem: the former looks at (finite) paths, while the latter looks at w-paths. 

4 Undecidability Results 

In this section, we point out that both the 3-Presburger-always problem and the 3- 
Presburger-almost-always problem are undecidable for 0-reversal-bounded NCMs. Ob- 
viously, the undecidability remains when NCMFs are considered. 

In [12], it is shown that the 3-Presburger-aIways problem and the 3-Presburger- 
almost-always problem are undecidable for discrete timed automata. The following 
techniques are used in that paper: 

- A deterministic two-counter machine can be simulated by a generalized discrete 
timed automaton that allows tests in the form of linear constraints, 

- The generalized discrete timed automaton can be simulated by a discrete timed 
automaton under a Presburger path restriction P [12] (i.e., each intermediate con- 
figuration of the discrete timed automaton must be in P), 
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- The halting problem (i.e., whether a control state is reachable, which is undecidable) 
for deterministic two-counter machines can be reduced to the 3-Presburger-always 
problem for discrete timed automata, 

- The finiteness problem (which is undecidable) for deterministic two-counter ma- 
chines can be reduced to the 3-Presburger-almost-always problem for discrete timed 
automata. 

If in the items above, “discrete timed automaton” is replaced by “0-reversal-bounded 
multicounter machine”, the techniques are still applicable. The reason is that, as shown 
below, any deterministic two-counter machine can be simulated by a deterministic gen- 
eralized 0-reversal-bounded multicounter machine that allows tests in the form of linear 
constraints on counters. 

Lemma 10. Any deterministic two-counter machine M can be simulated by a deter- 
ministic 0-reversal-bounded multicounter machine M' that allows tests in the form of 
y — z=ffc, where y and z are counters, and c is an integer [18]. 

Analogous to the proofs in [12], we have. 

Theorem 6. The 3-Presburger-always problem and the 3-Presburger-almost-always 
problem are undecidable for 0-reversal-bounded multicounter machines. The undecid- 
ability remains when reversal-bounded multicounter machines with a free counter are 
considered. 

Considering the negations of the two problems, we have. 

Theorem 7. The y -Presburger-eventual problem and the \/ -Presburger-i.o. problem are 
undecidable for 0-reversal-bounded multicounter machines. The undecidability remains 
when reversal-bounded multicounter machines with a free counter are considered. 



5 Conclusions 

In this paper, we investigated a number of Presburger liveness problems for NCMFs. We 
showed that 

- The 3-Presburger-i.o. problem and the 3-Presburger-eventual problem are both de- 
cidable. So are their duals, the V-Presburger-almost-always problem and the V- 
Presburger-always problem. 

- The V-Presburger-i.o. problem and the V-Presburger-eventual problem are both un- 
decidable. So are their duals, the 3-Presburger-almost-always problem and the 3- 
Presburger-always problem. 

These results can be used to formulate a weak form of Presburger linear temporal logic 
and develop its model-checking theories for NCMFs. We believe the techniques de- 
veloped in [12] and in this paper can be naturally combined to study the same set of 
liveness problems on an extended form of discrete timed automata containing, besides 
clocks, a number of reversal-bounded counters and a free counter. We conjecture that 
the 3-Presburger-i.o. problem is also decidable for (discrete timed) pushdown automata 
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when the counts on individual stack symbols are part of the Presburger property being 
verified [8]. 

As for applications of NCMFs, “reversal-bounded counters” may appear unnatural, 
and applying the decidable results presented in this paper in model-checking may seem 
remote. However, the model ofNCMFs does have applications in verification/debugging 
infinite state systems as we discuss below. 

- Many infinite state systems can be modeled as multicounter machines. These ma- 
chines, usually having Turing computing power, can be approximated by NCMFs by 
restricting all but one counter to be reversal-bounded. This approximation technique 
provides a way to debug Presburger safety properties for, for instance, arithmetic 
programs (for a number of conservative approximation techniques for real-time sys- 
tems see [9, 10, 11]). On the other hand, the technique also shows a way to verify an 
3-Presburger i.o. problem for a multicounter machine if the same problem is true 
on the resulting NCMF. 

- A non-decreasing counter is also a reversal-bounded counter with zero reversal 
bound. This kind of counters has a lot of applications. For instance, it can be used to 
count time elapse, the number of external events, the number of a particular branch 
taken by a nondeterministic program (this is important, when fairness is taken into 
account), etc. For example, consider a finite-state transition system T. Associate a 
name ‘a’ from a finite alphabet to each transition in T (a, in the reactive system T, 
can be treated as the input signal triggering the transition). At any moment in an 
execution of T, is used to count the number of transitions labeled by a that have 
been executed. Each can be considered as a 0-reversal-bounded counter, since 

is nondecreasing along any execution path. To make the system more complex, 
on some transitions, the triggering conditions also contain a test that compares 
#6 ~ #c against an integer constant, for some fixed labels b and c '. Essentially 
T can be treated as a NCMF: those counts of #a’s are reversal-bounded counters 
and is the free counter. The results in this paper show that the following 

statement can be automatically verified: 

There is an execution ofT such that — 5^c > 0 holds for infinitely many 

times. 

This result can be used to argue whether a fairness condition on the event label 
counts of T is realistic. 

The decision procedure for the 3-Presburger i.o. problem seems hard to implement. 
However, by closely looking at the proofs, the hard part is how to (practically) calculate 
the binary reachability of an NCMF. Once this is done, testing the existence of the lo- 
chain in Lemma 7 and Lemma 8 is equivalent to checking a Presburger predicate (i.e., 
in the lemmas) in a particular format (the Omega Library [22] can be used to do 
the checking). Calculating the binary reachability of an NCMF needs some software 
engineering thoughts. We are currently conducting a prototype tool implementation. 
Thanks go to anonymous reviewers for many useful suggestions. 

* It is important that b and c are fixed. If we allow comparisons on the counts of four labels (i.e., 
besides the test on #6 — #c, we have a test on — #e), then T is Turing powerful [18]. 
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Abstract. We report on an ongoing effort in mechanically proving cor- 
rect a compiling specification for a bootstrap compiler from ComLisp (a 
subset of ANSI Common Lisp sufficiently expressive to serve as a com- 
piler implementation language) to binary Transputer code using the PVS 
system. The compilation is carried out in four steps through a series of in- 
termediate languages. This paper focuses on the first phase, namely, the 
compilation of ComLisp to the stack-intermediate language SIL, where 
parameter passing is implemented by a stack technique. The context of 
this work is the joint research effort Verifix aiming at developing meth- 
ods for the construction of correct compilers for realistic programming 
languages. 

1 Introduction 

The use of computer based systems for safety-critical applications requires high 
dependability of the software components. In particular, it justifies and demands 
the verification of programs typically written in high-level programming lan- 
guages. Correct program execution, however, crucially depends on the correct- 
ness of the binary machine code executable, and therefore, on the correctness 
of system software, especially compilers. As already noted in 1986 by Chirica 
and Martin [3], full compiler correctness comprises both the correctness of the 
compiling specification (with respect to the semantics of the languages involved) 
as well as the correct implementation of the specification. 

Verifix [9, 6] is a joint German research effort of groups at the universities 
Karlsruhe, Kiel, and Ulm. The project aims at developing innovative methods for 
constructing provably correct compilers which generate efficient code for realistic, 
practically relevant programming languages. These realistic compilers are to be 
constructed using approved development techniques. In particular, even standard 
unverified compiler generation tools (such as Lex or Yacc) may be used, the 
correctness of the generated code being verified at compile time using verified 

* This research has been funded by the Deutsche Forschungsgemeinschaft (DFG) un- 
der project “Verifix". 
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program checkers [7]. Verifix assumes hardware to behave correctly as described 
in the instruction manuals. 

In order not to have to write the verified parts of the compiler and checkers 
directly in machine code, a fully verified and correctly implemented initial com- 
piler is required, for which efficiency of the produced code is not a priority. The 
initial correct compiler to be constructed in this project transforms ComLisp 
programs into binary Transputer code. ComLisp is an imperative proper subset 
of ANSI-Common Lisp and serves both as a source and implementation language 
for the compiler. The construction process of the initial compiler consists of the 
following steps: 

— define syntax and semantics of appropriate intermediate languages. 

— define the compiling specification, a relation between source and target lan- 
guage programs and prove (with respect to the language semantics) its cor- 
rectness according to a suitable correctness criterion. 

— construct a correct compiler implementation in the source language itself 
(a transformational constructive approach is applied which builds a cor- 
rect implementation from the specification by stepwise applying correctness- 
preserving development steps [5]). 

— use an existing (unverified) implementation of the source language (here: 
some arbitrary Common Lisp compiler) to execute the program. Apply the 
program to itself and bootstrap a compiler executable. Check syntactically, 
that the executable code has been generated according to the compiling spec- 
ification. For this last step, a realistic technique for low level compiler verifi- 
cation has been developed which is based on rigorous a posteriori syntactic 
code inspection [8,11]. This closes the gap between high-level implementation 
and executable code. 

The size and complexity of the verification task in constructing a correct com- 
piler is immense. In order to manage it, suitable mechanized support for both 
specification and verification is necessary. We have chosen the PVS specification 
and verification system [16] to support the verification of the compiling specifi- 
cation and the construction process of a compiler implementation in the source 
language. 

In this paper, we focus on the mechanical verification of the compiling spec- 
ification for the ComLisp compiler. In particular, we describe the formalization 
and verification process of the first compilation phase from ComLisp to the stack- 
based intermediate language SIL, the first of a series of intermediate languages 
used to compile ComLisp programs into binary Transputer machine code: 

ComLisp ^ SIL ^ C“* ^ TASM ^ TC 

First, ComLisp is translated into a stack intermediate language (SIL), where 
parameter passing is implemented by a stack technique. Expressions are trans- 
formed from a prefix notation into a postfix notation according to the stack 
principle. SIL is then compiled into C'"* where the ComLisp data structures 
(s-expressions) and operators are implemented in linear integer memory using a 
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run-time stack and a heap. These two steps are machine independent. In the next 
step, control structures of C'"* are implemented by linear assembler code with 
jumps, and finally, abstract assembler code is transformed into binary Trans- 
puter code. 

This paper is organized as follows. The next section presents the formalization 
of the languages ComLisp and SIL, that is, their abstract syntax and semantics. 
Section 3 then focuses on the compilation process from ComLisp to SIL. Finally, 
Section 4 is concerned with the correctness of this compilation process. 

2 Syntax and Semantics of the Languages 

2.1 ComLisp 

A ComLisp program consists of a list of global variables, a list of possibly mutual 
recursive function definitions, and a main form. ComLisp forms (expressions) 
include the abort form, s-expression constants, variables, assignments, sequential 
composition (progn), conditional, while loop, call of user defined functions, call of 
built-in unary (uop) and binary (bop) ComLisp operators, local let-blocks, list* 
operator (constructing a s-expression list from its evaluated arguments), case- 
instruction, and instructions for reading from the input sequence and writing to 
the output. The ComLisp operators include the standard operators for lists (e.g. 
length), type predicates for the different kinds of s-expressions, and the standard 
arithmetic operations (e.g. +,*, floor). The only available datatype is the type 
of s-expressions which are binary trees built with constructor “cons” , where the 
leaves are either integers, characters, strings, or symbols. The set of symbols 
includes T and NIL. The abstract syntax of ComLisp is given as follows: 

p ::= Xi, . . . , Xfcj /i, . ■ • , /nj c 

/ ::= h{xi,...,Xm) e 

e ::= abort | c | x | x := e | progn(ei, . . . , e„) | if {e\, 62, efl) \ while{ei,e2) \ 
calflh, Cl, . . . , e„) | uop{e) \ &op(ei, 62) | let{xi = ei, . . . , x„ = e„; e) | 
list*{ei ,. . . ,e„) I cond{pi ei,... ,pn e„) | 
read-char \ peek-char \ print-char{e) 

The static semantics of ComLisp programs, function definitions, and forms is 
specified by means of several well-formedness predicates. A ComLisp form is 
well-formed — with respect to a local variable environment C (a list of formal 
parameters), a list of global variables 7, and a function environment F (a list 
of function definitions) — if the list of local and global variables are disjoint, all 
variables are declared (that is, occur either in or 7) and each user-defined 
function is declared in F and called with the correct number of arguments (cor- 
rect parameter passing). Formally, a relation wf{e, 7, F) is defined inductively 
on the structure of forms (omitted here). Analogously, well-formedness rela- 
tions for function environments (predicate w;/proc(A, 7)) and programs (predicate 
Wprogram(p)) defined (definitions omitted). 
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For the intermediate languages occurring in the different compilation phases 
of the ComLisp to Transputer compiler, a uniform relational semantics descrip- 
tion has been chosen. The (dynamic) semantics of ComLisp is defined in a struc- 
tural operational way by a set of inductive rules for the different ComLisp forms. 
This kind of semantics is also referred to as big-step semantics or evaluation se- 
mantics in contrast to a transition semantics (small-step semantics) such as 
abstract state machines (ASM’s). A ComLisp state is a triple consisting of an 
(infinite) input sequence (stream) of characters, an output list of characters, and 
the variable state which is a mapping from identifiers to values (s-expressions) : 

statecL ■■= sequence[char] x char* x {Ident — >■ SExpr) 

ComLisp forms are expressions with side-effects, that is, they denote state trans- 
formers transforming states to pairs of result value and result state. The defini- 
tion of the semantics of forms uses the following notation: T h s : e — (u, g). It 
states that evaluating form e in state s and function environment F terminates 
and results in a value v and final state q. Given rules for each kind of form, the 
semantics is defined as the smallest relation — >■ satisfying the set of rules. For 
example, the semantics of a function call is given by two rules. One for param- 
eterless functions (omitted here), and one for functions with parameters, where 
the parameters are sequentially evaluated, the resulting values being then bound 
to the parameters before evaluation of the body and unbound after returning 
the value: 



[f{xi • • • x„) ^ body] G r (n > 1) 
r\- qi'. Ci^ (vi, qi+i) (1 < z < n) 

r h g„+i[xi ^ ui, . . . ,x„ ^ u„] : body (v,r) 
r\- qi : call{f,ei , . . . ,e„) (v,r[xi ^ g„+i(xi), . . . ,x„ ^ q„+i(x„)]) 

The semantics of a ComLisp program is given by the input/output behavior of 
the program defined by a relation PsemcL between input streams is and output 
lists ol. PsemcL holds if the evaluation of the main form e in an initial 

state, where the input stream is given by is, the output list is empty and all 
variables are initialized with NIL, terminates with a value v in some state q with 
output list ol. Formally: 

Psemci.{p){is,ol) ::= 3v,q. (P h {is, [],Xx.NIL) : e (v,q)) A ((^output = ol) 

2.2 SIL 

SIL, the stack intermediate language, is a language with parameterless proce- 
dures and s-expressions as available datatype. Programs operate on a runtime 
stack with frame-pointer relative addresses. A SIL program consists of a list of 
parameterless procedure declarations and a main statement. There are no vari- 
ables, only memory locations and the machine has statements for copying values 
from the global to the local memory and vice versa. For example, copy{i,j) copies 
the content at stack relative position i to relative position j, gcopy{g,i) copies 
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from the global memory at position g to the relative position i, and itef{i, si, S 2 ) 
executes instruction S 2 if the content of stack relative position i is NIL, otherwise 
Si is executed. 

P /l) • ■ • ) fnj S 

f ::= h s 

s ::= abort \ copyc{c,i) \ copy{i,j) \ gcopy{g,i) \ copyg{g,i) \ 

81 , 82 ) I sq{ 8 i,..., 8 n) \ fcall{h,i) \ uop{i) \ bop{i) \ 
while {i, 81 , 82 ) I read-char{i) \ peek-char{i) \ print-char{i) \ list*{n,i) 

The static semantics is again specified by means of well-formedness predicates for 
SIL statements, SIL procedure declarations, and SIL programs (definitions omit- 
ted here). SIL statements denote state transformers, where a SIL state consists 
of the input stream, the output list, the global memory (a list of s-expressions), 
and the local memory (consisting of the frame pointer base : Nat and the stack, 
a function from natural numbers to s-expressions). 

statesih '■'■= sequence[char] x char* x SExpr* x Nat x {Nat — >■ SExpr) 

As for ComLisp, an evaluation semantics for SIL statements is defined as the 
smallest relation E \- s : cmd — 1 q satisfying the set of rules given for the 
language constructs. The relation states that executing the statement cmd in 
state 8 and SIL procedure environment E (a list of procedure declarations) is 
defined, terminates, and results in a new state q. As for ComLisp, the semantics 
of a SIL program is its I/O behavior: 

d^semsihip) ^ ■■ ^ iflit . 8 y q) A (^output ^0 

where the initial state is defined by init ::= {is, [], [NIL, . . . , NIL],0, Xn.NIL). 

2.3 PVS Formalization of the Languages 

Abstract syntax, static and dynamic semantics of the languages have to be for- 
malized in the PVS specification language. The language is based on classical 
higher-order logic with a rich type system including dependent types. In addi- 
tion, the PVS system provides an interactive proof checker that has a reasonable 
amount of theorem proving capabilities. A strategy language enables to combine 
atomic inference steps into more powerful proof strategies allowing to define 
reusable proof methods. 

1. Abstract Syntax: the PVS abstract data type (ADT) construct is used. Com- 
Lisp forms, for example, are defined by an ADT, where for each kind of form 
there exists a corresponding constructor. For ADT definitions in PVS, a large 
theory is automatically generated including induction and reduction schemes 
for the ADT, termination measures, and a set of axioms stating that the data 
type denotes the initial algebra defined by the constructors. Note that the 
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formalizations make heavily use of library specifications. However, a lot of 
new types, functions, and predicates must be added for the specifications, 
as well as lemmas for their useful properties (which have to be proved) . 

2. Static Semantics: the well-formedness predicates must be formalized. Since 
each function must be total in PVS, a termination measure must be pro- 
vided for the recursive definitions. We have specified the structural size of a 
ComLisp form using the reduction scheme from the ADT theory. 

3. Dynamic Semantics: the rules must be represented in PVS. A set of structural 
rules is represented as an inductive PVS relation which combines all the rules 
in one single definition E{r){s,e,v,q, N) which denotes F \~ s : e — 1- (v,q). 
Free logical variables in the rules are existentially quantified in the corre- 
sponding PVS relation. In general, properties about inductive relations can 
be proved by rule induction. Here, the definition of relation E has an ad- 
ditional counter parameter N to formulate an induction principle needed 
for the proof for the selected notion of correctness (see Sect. 4). fV is de- 
creased when entering the body of a function or while loop, since in this case 
the forms in the antecedents of the corresponding rules are not structurally 
smaller, and left unchanged otherwise. 

3 Compiling ComLisp to SIL 

The compilation from ComLisp to SIL generates code according to the stack 
principle and translates parameter passing to statements which access the data 
stack. For a given expression e, a sequence of SIL instructions is generated that 
computes its value and stores it at the top of the stack (relative position k in 
the current frame). The parameters Xi, . . . of a function are stored at the 
bottom of the current frame (at relative positions 0, . . . , n — 1). A SIL function 
call fcall{h, i) increases the frame pointer base by i which is reset to its old value 
after the call and local variables introduced by let are represented within the 
current frame. For each syntactical ComLisp category, a compiling function is 
specified. 

— Cform(e, 7, p^ k) is defined inductively on e. It takes a form e, a global environ- 
ment 7 (a list of identifiers), a compile time environment p (an association 
list which associates relative positions in the current stack frame with local 
variables), and a natural number k (denoting the current top of stack) and 
produces a SIL statement (definition omitted). 

— A function definition is compiled by compiling the body in a new environment 
(where the formal parameters are associated with relative positions 0, . . . , n— 
1) with the top of stack set at position n. Finally, the current stack frame has 
to be removed, leaving only the result on top (achieved by a copy instruction 
from position n to 0) . 

Cdef(/i(xi, . . . ,x„) ^ e)( 7 ) ::= h ^ sg(Cform(e, 7. l^i copy{n,0)) 

— The compilation functions for function environments Cdefs(^)(7) and Com- 
Lisp programs Cprog(p) are straightforward and omitted here. 
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4 Correctness of the Compilation Process 

An appropriate notion of correct compilation for sequential imperative languages 
on a concrete target processor must take the finite resource limitations of the 
target architecture into account. The notion of correctness used in Verifix is the 
preservation of the observable behavior up to resource limitations. In our case 
correctness of the compilation process is stated as follows: for any well-formed 
ComLisp program p, whenever the semantics of the compiled program is defined 
for some input stream is and output list ol, this is also the case for p for the 
same is and ol: 

Theorem 1 (Correctness of Program Compilation). 

yp,is, ol. w/pj.Qgj.g^jjj(p) {Psemsn^{Cprog{p)){is){ol) PsemcL (p) I*®) ('^0) 

Unfolding PsemsiL PsemcD the semantics of forms and corresponding SIL 
statements have to be compared. In particular, this requires relating source and 
target language states. ComLisp forms denote state transformers transforming 
a state into a result value and a result state (if defined) a — >-e {v,a'). On the 
other hand, SIL statements denote ordinary state transformers s -^s s'. Two 
relations are required: one relation pi„ relates ComLisp input states cr with SIL 
states s, while the other relation pout relates ComLisp output states {v, a') with 
SIL states s'. Figure I illustrates the correctness property for forms by means 
of a commuting diagram. The relations are parameterized with a list of global 



statecL 5 (7 ► {v,<7') G SExpr x statecL 




statesih 9 s ^ s' G stotesiL 

Fig. 1. Correctness property for the compilation of ComLisp forms 



variables 7, the local compile time environment p, and the current top of stack 
position k. Relation pin distinguishes between local and global variables. The 
relative address for variables for which p is defined is given by p(x), while the 
address of the global variables in 7 is given by "f{x). Relation pout additionally 
assumes that the final value v is available at the stack top (relative address k). 
In addition, it is required that the input streams and the output lists of a and 
s correspond. The data representation relations are defined as follows: 

Pin{l,P,k){(7,s) ::= 

[\/x G dom{p). {p{x) <k) ^ (a(x) = siocai(sbase + p(a;)))] A 
[Vx G 7. (7(0;) < Isgiobail) A (cr(a;) = Sgiobai(7(a;)))] A 

(-^input — ^input) A (Soutput — CToutput) 

Pout( 7 . P, k){v, a', s') ::= (s;ocai(s'base + k) = v) A (Pi„( 7 , P, k){a', s')) 




A Mechanically Verified Compiling Specification for a Lisp Compiler 151 



In order to state the correctness property for the compilation of forms two 
additional invariants are required: 

1. The first invariant {source Jnvarl{C,,^){a, a')) relates ComLisp input and 
output states. It assures that identifiers not belonging to C or 7 (the local 
and global identifier lists) do not alter their values. 

2. The second one {invar?{p,k){s, s')) relates input SIL states s with output 
SIL states s'. It states that 

(a) the frame pointers of s and s' are identical. 

(b) the contents of all stack cells with addresses not within the range of 
the local environment p do not change from s to s' . In particular, this 
includes all stack cells below the current stack frame. 

This property is required to ensure that for function and operator calls the 
computed values of the arguments are still available (and not overwritten) 
when the operator is applied or the function body is executed. 

All ingredients have now been collected to state the correctness property for 
the translation of forms. The diagram in Fig. 1 has to commute in the sense 
of preservation of partial program correctness. The property states that if the 
function environment and the ComLisp form is well-formed, the compile time 
environment p is injective and its domain corresponds to the local variable list 
C, the initial ComLisp and SIL states are related by and the code resulting 
from compiling form e transforms SIL state s into s' , then there exists a value 
V and ComLisp state a' such that e evaluates in state cr to {v,a') and the final 
ComLisp and SIL states are related by Pout and the target states and source 
states invariants hold: 

Definition 1 (Correctness Property for Form Compilation). 

correct -prop{r p,k){e) ::= 

Vct, s,s'. w/p,o(,(T, 7 ) A wf{e,C,l,r) A injective?{p) A {dom{p) = () A 

Pi„(7,p, fc)(cr, S) A (Cdefs(C)(7) I" s : Cform(e,7,p, fc) s') 

=^3v,a' : {r\- a : {v,a')) A pont{"f, P, k){v,a' , s') A 

invar?{p,k){s, s') A source Jnvar?{C,,^){a, a') 

The main obligation is to prove that this property holds for each kind of form: 

Theorem 2 (Correctness of Form Compilation). 

Ve, r, 7 , C, p, k. correct -prop {r, 7 , C, p, k){e) 

In the PVS formalization, the correctness property has an additional counter 
argument N according to the inductive relations defining the semantics. This 
additional argument is required here since we prove that the target semantics 
implies the source semantics but the compilation is defined structurally on the 
source language. If we would prove the other way round, rule induction (without 
a counter argument) would suffice. The PVS proof of this theorem is done by 
measure induction (a variant of well-founded induction) using the lexicographic 
combination of the counter N and the structural size of form e as termination 
measure. This measure ensures that for each kind of form the induction hypoth- 
esis is applicable. To suitably manage the complexity of this proof, for each kind 
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of form a separate compilation theorem is introduced. The proof of Theorem 2 
is then carried out by case analysis and application of the compilation theorems. 

Most of the proofs of the compilation theorems follow a similar scheme ac- 
cording to the structure of the correctness property (see Definition 1): 

1. First, definitions must be unfolded and the SIL statement which results from 
compiling the ComLisp form must be “executed” symbolically according to 
the operational SIL semantics. 

2. The induction hypothesis (stated as a precondition in the compilation lem- 
mas) must be instantiated. 

3. Instantiations for the result value v and result state <j' (existentially quanti- 
fied variables) of the ComLisp form must be found. 

4. The consequent part of the formula must be proved. This reduces to showing 
four properties: 

(a) show that form e evaluates to the instantiated value and result state. 

(b) show with the help of precondition that the output source and target 
states are related by Pout (Note that Pout is defined by means of pi„). 

(c) show that the target state invariant holds. 

(d) show that the source state invariant holds. 

PVS strategies have been defined for some of the cases of the general scheme. 
These strategies enable the (nearly) automatic discharge of the respective cases. 
The proofs of most of the compilation lemmas are relatively straightforward 
and follow directly the scheme. However, some of the compilation theorems are 
tedious, in particular the theorems for function call, let-form, and list*-. They 
make use of an additional lemma which relates sequences of ComLisp forms with 
SIL statement sequences. Due to lack of space we cannot go into the details of 
the proofs. All the proofs have been completely accomplished using PVS. 



Statistics 

We present some statistics concerning the formalization and verification effort 
for this compilation step. Table 1 summarizes the results. First of all, we have 
extended the built-in PVS library with additional functions and properties for 
lists, and with a new theory for association lists (finite maps). This library has 
already been reused for other verification tasks. There are 7 additional PVS the- 
ories with 621 lines of PVS specification code (LOC), 139 obligations to prove 
including all type correctness conditions generated by the system. These obliga- 
tions are proved interactively by invoking 1048 proof steps. The specifications 
of the languages ComLisp and SIL including the definition of s-expressions and 
corresponding unary and binary operators involve 7 theories. Not surprisingly, 
the most effort lies in the verification of the compiling specification: 30 proof 
obligations (mainly the compiling theorems) have been proved in more than 
1600 proof steps. Most work has been put into the verification of the compila- 
tion theorems for function call, fet, and list*. Although strategies for parts of 
the proofs have been developed, the number of manual steps is quite high and 
shows that this verification task is by no means trivial. 
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It is hard to give an estimation of the amount of work invested in the final 
verification, since we started the verification on a smaller subset of ComLisp 
in order to experiment with different styles of semantics and find the necessary 
invariants, and then incrementally extended this subset and tried to rerun and 
adapt the already accomplished proofs. A coarse estimation of the total formal- 
ization and verification effort required for the compiling specification for all 4 
compilation phases is about 3 person-years. 



Table 1. Formalization and verification statistics 





PVS theories 


LOG 


proof obligations 


proof steps 


spec, of languages 


7 


759 


139 


575 


compiling specification 


1 


122 


36 


95 


compiling verification 


1 


219 


30 


1617 


list, alist library 


7 


621 


139 


1048 




16 


1721 


344 


3335 



Related Work 

Verification of compiler correctness is a much-studied area starting with the 
work by McCarthy and Painter in 1967 [13], where a simple compiler for arith- 
metic expressions has been proved correct. Many different approaches have been 
taken since then, usually with mechanized support to manage the complexity 
of the specifications and the proofs, for example [17, 12,2, 14,4, 1[. Most of the 
approaches only deal with the correctness of the compiling specification, while 
the approach taken in the Verifix project also takes care of the implementation 
verification, even on the level of binary machine code. Another difference of our 
approach is that we are concerned with the compilation of “realistic” source 
languages and target architectures. A ComLisp implementation of the ComLisp 
compiler as well as a binary Transputer executable is available. 

Notable work in this area with mechanized support is CLInc’s verified stack 
of system components ranging from a hardware-processor up to an imperative 
language [14] . Both the compiling verification and the high-level implementation 
(in ACL2 logic which is a LISP subset) have been carried out with mechanized 
support using the ACL2 prover. Using our compiler, correct binary Transputer 
code could be generated. 

The impressive VLISP project [10] has focused on a correct translation for 
Scheme. However, although the necessity of also verifying the compiler imple- 
mentation has been expressed this has explicitly been left out. Proofs were ac- 
complished without mechanized support. 

P. Curzon [4] considers the verification of the compilation of a structured 
assembly language, Vista, into code for the VIPER microprocessor using the 
HOL system. Vista is a low-level language including arithmetic operators which 
correspond directly to those available on the target architecture. 
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The compilation of PROLOG into WAM has been realized through a series of 
refinement steps and has been mechanically verified using the KIV system [18]. 
A (small-step) ASM semantics is used for the languages. 

5 Concluding Remarks 

In this paper we have reported on an ongoing effort in constructing a correct 
bootstrap compiler for a subset of Common Lisp into binary Transputer code. 
We have focused on the formal, mechanically supported verification of the com- 
piling specification of the first compilation phase. The verification of the second 
phase, the translation from SIL to C“*, where s-expressions and their operators 
are implemented in linear memory (classical data and operation refinement), is 
also completed. Current work is concerned with the verification of the compiler 
back-end, namely, the compilation from C“* into abstract Transputer assembler 
code TASM. The standard control structures of C'"* must be implemented by 
conditional and unconditional jumps, and the state space must be realized on 
the concrete Transputer memory. Hence, this step is again a data refinement pro- 
cess to be verified. The verification of the last compilation phase, where abstract 
Transputer assembler is compiled into binary Transputer code (TC) has already 
been accomplished following approved verification techniques [15]: starting from 
a (low-level) base model of the Transputer, where programs are a part of the 
memory, a series of abstraction levels is constructed allowing different views on 
the Transputer’s behavior and the separate treatment of particular aspects. 

We have demonstrated that the formal, mechanized verification of a non- 
trivial compiler for a (nearly) realistic programming language into a real target 
architecture is feasible with state-of-the-art prover technology. 
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Abstract. In recent years, it has been established that regular model 
checking can be successfully applied to several parameterized verification 
problems. However, there are many parameterized verification problems 
that cannot be described by regular languages, and thus cannot be veri- 
fied using regular model checking. In this study we try to practice sym- 
bolic model checking using classes of languages more expressive than the 
regular languages. We provide three methods for the uniform verification 
of non-regular parameterized systems. 



1 Introduction 

During the last two decades, several formal methods have been developed to 
answer the verification problem of finite-state systems. The verification problem 
asks the question of whether a given reactive system is correct relative to some 
specification. Although many interesting concurrent programs are in fact finite 
state, they are often given semantically in terms of a parameter n, representing 
the number of concurrent processes. Such a schematic program really represents 
an infinite family of uniformly defined programs. Such programs are often re- 
ferred to as parameterized systems. A challenging problem is to provide methods 
for the uniform verification of parameterized systems, i.e., proving correctness 
for all possible programs obtained by instantiating the parameter. We refer to 
this problem as the parameterized verification problem. In 1986 Apt and Kozen 
proved that, in general, the parameterized verification problem is undecidable, 
even when each instance is finite-state [3]. However, for specific families the 
problem may be solvable. 

Model Checking (MC) is an automatic technique for answering the verifica- 
tion problem. In this framework, specification are usually expressed by a propo- 
sitional temporal logic and programs are modeled by state-transition systems. 
The model checking procedure performs an exhaustive search of the state space 
of the system to determine whether the system satisfies the specification. The use 
of an exhaustive state-space exploration limits the application of model checking 
to finite-state systems. 

* This work was supported in part by the European Commission (FET project AD- 
VANCE, contract No IST-1999-29082), and carried out at the John von Neumann 
Minerva Center for the Verification of Reactive Systems. 
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However, it is possible to perform a state-space exploration of infinite-state 
systems as well, by using an implicit representation for sets of states. The frame- 
work of model checking where sets of states are represented implicitly using some 
symbolic representation is known as symbolic model checking (SMC) [11]. Sym- 
bolic model checking can be applied to infinite-state systems, although in this 
case, the termination of the procedure is not guaranteed. 

Regular model checking is an application of symbolic model checking where 
regular expressions are used to represent symbolically sets of states [21]. Regu- 
lar model checking can be applied to any verification problem that is expressible 
using regular languages. Such an application is successful if the regular model 
checking procedure has terminated. In recent years, it has been established that 
regular model checking can be successfully applied to several types of parame- 
terized verification problems. 

However, many interesting parameterized systems cannot be handled by reg- 
ular model checking since the class of regular languages is not strong enough 
to express them. An example for such a verification problem is the Peterson 
algorithm for mutual exclusion among n processes [23]. The existence of such 
examples, is the main motivation for this study. 

In this study we try to practice symbolic model checking using as symbolic 
representation classes of languages more expressive than the regulars. As a first 
attempt, we use context-free languages at the last step of the symbolic model 
checking procedure, while regular languages are used in the procedure until this 
last step. This seemingly slight change, already enables us to verify mutual ex- 
clusion for the Peterson algorithm. 

By carefully examining the model checking procedure, one can compile a list 
of the requirements a class of languages must meet in order to be adequate for 
symbolic model checking [21]. Such a list consists of several operations the class 
must be effectively closed under, and several questions that must be effectively 
decidable for the class. We recognize that the class of languages accepted by 
deterministic pushdown automata Cdpda, meets all requirements but two: the 
class is not closed under projection and there is no known efficient algorithm to 
decide equivalence. We thus direct our effort to find a class, which is a subset of 
the class Cdpda, and possesses efficient algorithms for computing projection and 
deciding equivalence. This class must also satisfy the rest of the requirements. 

We succeed to define a sub-class of Cdpda, which we denote Cdpda-m, 
for which there exists a semi-algorithm^ to compute projection. In addition, 
there exists an efficient algorithm to answer the equivalence problem for this 
class. This class also satisfies all other requirements. Thus, we establish a class, 
which is more expressive than the class of regular languages, and yet is adequate 
for symbolic model checking. The Peterson example can be symbolically model 
checked, using languages in this class. Note that the notion of adequacy achieved 
here is weaker than the one introduced in [21] because termination failure can 



^ By semi-algorithm we mean a computational procedure that is not guaranteed to 
halt, but is guaranteed to give a correct answer in all cases at which it does halt. In 
practice, we use semi-algorithms by running them up to a prescribed time limit. 
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also occur in the computation of projection. However, due to the general unde- 
cidability of the problem, a semi-algorithm is the best we can hope for in any 
case. 

Recall that the standard fix-point computation in the symbolic model check- 
ing procedure is not guaranteed to terminate when applied to infinite-state sys- 
tems. This difficulty must be addressed when considering a class of languages 
to be adequate for symbolic model checking. In order for such a class to be 
practically adequate for symbolic model checking, it must also provide means 
to tackle this difficulty. Indeed, for regular model checking, many techniques 
to overcome this problem have been developed (see related work section). The 
common idea behind these techniques lies in calculating the effect of taking an 
arbitrary number of system-transitions in one step, often refer to as calculating 
meta-transitions or “accelerations” . 

For a special case of the class Cdpda-m there exists an algorithm to compute 
projection. In addition, all techniques developed for calculating meta-transitions 
for regular model checking, can be applied to this (sub-)class. This class is there- 
fore practically adequate for symbolic model checking. The Peterson example can 
also be verified using languages in this class. 

Due to space limitation, further details and proofs are omitted, these can be 
found in the full version of the paper. ^ 



Related Work. Regular model checking has been advocated by [21] and [30] as a 
uniform paradigm for algorithmic verification of several classes of parameterized and 
infinite-state systems. The use of regular languages to express state properties goes 
back to [13]. The problem of calculating meta-transitions is one of the most laborious 
problems in this field of research. It has been thoroughly researched [1,2,18], resulting 
in several corrective techniques, such as acceleration [24,4] , calculation of the transitive 
closure [20,9] and widening [22,9,15]. 

The study of model checking pushdown systems has recently been receiving growing 
attention [12,5,10,6,7,30,16]. Nevertheless, these efforts do not pertain to this study 
for two reasons: first, we use the pushdown automata to represent the set of states, 
while the above work considers the pushdown automata as the system being analyzed. 
Furthermore, a fundamental difference is that all systems previously considered share 
a common characteristic: a regular language represents their state space, whereas our 
main interest is systems whose state space cannot be represented by a regular language. 

There has been other works also pursuing symbolic representations which are more 
expressive than regular languages. Studies by Boigelot and Wolper [29] and Comon and 
Jurski [14] give symbolic representations for configurations of systems with counters. 

Perhaps the work most related to this research is by Bouajjani and Habermehl [8]. 
They dehne a symbolic representation, denoted CQDD, which is an extension of the 
QDDs defined by [5]. While QDDs are finite automata, CQDDs are a combination of 
restricted finite automata and linear constrains on the number of occurrences of sym- 
bols. The CQDDs are a symbolic representation which is more expressive than regular 
languages. As an example, they show that CQDDs accept the language 
However, to our understanding, the CQDDs are not strong enough to express the Pe- 
terson example, which we were able to verify using all our methods. 

^ the full version of the paper can be found in www.wisdom.weizmann.ac.il/~dana 
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2 Preliminaries 

Definition 1. Let S be an alphabet. A bi-letter over S is an element of S x E. 
A bi-word over A is a string of bi-letters over E (i.e., an element of (A x A)*). 
A bi-language over A is a set of bi-words over A (i.e., a subset of (A x A)*). We 
use the notations [K] and [“] to denote respectively the bi-letter (a,b) and the 
bi-word [|(i] \lf\ ■ ■ ■ [)(;(] where u = ai02...a„ and v = bib 2 ...b„. 

Given a bi-language L over A, we denote by Ll|i (LIJ.2) the projection of L 
on the first (respectively, second) coordinate. Given a language L over A, we 
denote by L x A* the language { [”] : w £ L, u £ A*, |m| = |w|}, referred to as 
left lifting. Similarly, we define right lifting and use the notation A* x L. We use 
the standard notations L and L\ fl L 2 to denote the complement of a language 
and the intersection of two languages respectively. 

Definition 2. A system is a quadruple {E,x,0,p) where 

— A is a finite alphabet. 

— X C A* is a language over A, denoting the set of states. 

— 6> C X is a language over A, denoting the set of initial states. 

~ P^xxxisa bi-language over A, denoting the transition relation. 



2.1 Symbolic Model Checking and Regular Model Checking 

The verification problem for a system M and a property tp is to decide whether 
ip holds over all computations of M . Model checking is an automatic technique 
for answering the verification problem. In symbolic model checking (SMC), some 
symbolic representation C is used to represent sets of states. In the framework of 
symbolic model checking using the representation C, the system and the property 
to be verified are represented as expressions in C: The system is defined using 
C-expressions to describe its components, which are essentially sets of states 
or relations over sets of states. The invariance property is defined using some 
expression in C to describe the sets of states satisfying it. Then, the model 
checking procedures operate by manipulating sets of states (instead of individual 
states) through operations on the expressions of C. Symbolic model checking can 
be applied to infinite-state systems, although in this case, the termination of 
the procedure is not guaranteed. We refer to this problem as the convergence 
problem. 

Regular model checking [21,30] is an application of symbolic model checking, 
where regular expressions are used as the (symbolic) assertional language for 
describing sets of states. Using this framework, the system’s components, and 
the property to be verified are defined by means of regular expressions. Usually, 
we assume a given alphabet A to represent a local state. The set of global states 
X, the set of initial states 0 and the property ip, are specified by a regular 
expression (defining a regular language) over A. The transition relation p is 
specified by a regular expression using bi-letters over A (sometimes referred to 
as a bi-regular expression) . 
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Example 1. (token-string) 

Consider an array of processes that pass a token from left to right. We define the 
alphabet S = {1,0} to denote the local state of a process, i.e., that the process 
has or has not the token. A global state of a system consisting of n processes 
is defined by a word of length n, each letter describes the state of one process. 
The set of global states y is defined to be the language of all words of positive 
length, given by the regular expression (0 -I- 1)+. The set of initial states 0, is 
given by the regular expression 10*, indicating that the leftmost process holds 
the token. The transition relation p can be specified by the bi-regular expression 
([o] [i])* [o] [i] ([o] + [i])* defined over E = {0,1}. Alternatively, we can 

specify the transition relation by the length-preserving rewrite rule x 1 0 j/ — >■ 
xQ \ y (where x and y are arbitrary words over {0, 1}*). The invariance property 
stating that there is exactly one token at all times, can be given hy = Q* 10*. 



3 Adequacy of Classes of Automata for SMC 



Regular model checking is an important application of symbolic model checking 
which enables the verification of several classes of parameterized and infinite- 
state systems. Nevertheless, there are instances where a regular language is not 
expressive enough to describe the system or the property at hand. Our aim is 
to find a class of languages that is more expressive than the regular languages, 
yet is adequate for symbolic model checking. Naturally, we are looking for the 
largest class of languages, which are still amenable to symbolic model checking. 

The rich-language symbolic model checking methodology described in [21] lists 
a set of minimal requirements from an assertional language, in order for it to be 
adequate for symbolic model checking. We reformulate these requirements here 
in terms of classes of languages rather than assertions. We classify the languages 
involved in the symbolic model process, according to the operations applied 
to them. Our intention is to allow deployment of different classes of languages 
within the symbolic model checking process. We present backward and forward 
model checking procedures in terms of this classification. 

Let M, TZ and A be three classes of languages. Let G M and G A 

be languages adequate for specifying the property to be verified. Let Aq G A 
and Mq G Ad be languages adequate for representing the initial state of the 
system. Let Rp G TZ he & bi-language adequate for representing the transition 
relation of the system, augmented by the identity relation (idle transition). We 
will use the auxiliary languages Mq, Mi, M 2 , ... G M to represent system states. 
The following procedures describe backward and forward model checking: 



Procedure Backward MC 

Mq := Mfp 

For i = 0, 1, . . . repeat 

Mi+i := {{E* X Mi) n Rp)Jfi 
until Mi+i = Mi 
return Mi IT Ae = 0 



Procedure Forward MC 

Mq ~ Me 

For i = 0, 1, . . . repeat 

Mi+i := {{Mi X r*)TRp)Jf2 
until Mi+i = Mi 

return Mi T = 0 
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The classes A4, TZ and A are adequate for symbolic model checking, if the fol- 
lowing requirements hold: 

1. 7^ is adequate for representing p, and either A4 and A are adequate for 
specifying ip and 0 respectively (for Backward MC) or Ai and A are adequate 
for specifying 0 and ip respectively (for Forward MC). 

2. Either JA is effectively closed under complementation (for Backward MC) 
or A is effectively closed under complementation (for Forward MC). 

3. is effectively closed under lifting. 

4. Ai is effectively closed under intersection with TZ. 

5. Ai is effectively closed under projection. 

6. Either Ai is effectively closed under intersection with A and emptiness is 
effectively decidable for Ai, or A is effectively closed under intersection with 
Ai and emptiness is effectively decidable for A. 

7. Equivalence of two languages in Ai is effectively decidable. 

Assuming that p always includes the identity relation (implementing the idling 
step), it is unnecessary to require closure under union (as required in [21]). 



3.1 Meeting the Requirements 

The following combinations appear to meet all the requirements: 

1. Taking Ai —TZ = A to he the class of regular languages CpA- This leads to 
regular model checking. 

2. Taking Ai = TZ = CpA and letting A be the class of languages accepted 
by deterministic pushdown automata Cdpda- This combination is demon- 
strated in Section 4 by verifying Peterson’s mutual exclusion algorithm for 
an arbitrary number of processes. 

A more challenging step is considering Ai to be the Cdpda class, leaving the 
CpA class for TZ and A. Requirement 3 is met as Cdpda is closed under inverse 
homomorphism, which is a generalization of lifting. Requirements 2, 4 and 6 
are met as Cdpda is closed under complementation and intersection with a 
regular language, and the emptiness problem for Cdpda is decidable. However, 
requirement 5 is not met: Cdpda is not closed under projection. Requirement 
7, decidability of the equivalence problem, was an open question until recently. 
In 1997 it was proven positively by Senizergues [26]. However, the algorithm he 
provides is not effective [28]. 

We thus concentrate our efforts to find a sub-class of Cdpda which is ade- 
quate for symbolic model checking. Therefore, we must find effective algorithms 
to compute projection and to decide equivalence for languages in this class. In 
addition, this class must satisfy the rest of the closure and decidability properties 
discussed above. This is the topic of Section 5. 
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4 SMC Using Context-Free and Regular Languages 

In this section we concentrate on the framework of symbolic model checking 
where the class A is chosen to be the Cdpda class, while M and TZ are the 
CpA class. If we choose to apply backward model checking we can describe the 
property to be verified by some language in the Cdpda class, but then we 
have to compromise with a regular language for describing the initial condition. 
Alternatively, we can choose to apply forward model checking, which results in 
the initial condition being in Cdpda, while the property must be described by 
a regular language. 

4.1 Peterson’s Algorithm 

We focus on the verification of the Peterson algorithm for mutual exclusion 
among N processes presented in the figure below. 

The Peterson algorithm can be explained as follows. Each process P\i] has a 
priority variable y[i\. The range of the priority variable are the numbers from 0 
to N—\. In addition there are N—1 signature variables s[l], s[2], . . . s[iV— 1]. The 
domain of the signature variables consists of the processes indices 1, 2, ..., N . The 
variable s[j] holds the signature (index) of the last process that received priority 
j. Assume process P[i] has priority j . In order to increment its priority (to j + 1), 
it must be either the process with the highest priority, or not the last who signed 
in the signature s[j]. A process can enter the critical section if it has the highest 
priority {N — 1) and it is not the last who signed in (in the signature of the 
highest priority, s[A — 1]). When a process exits the critical section its priority 
is reset. 



N : natural initially A > 1 
y : array [1..A1 of fO..A — 11 initially y — Q 
s : array [1..A- 1] of [1..A] 



N 



P[i] 



i=l 



't : integer 
to: loop forever do 
"£i: Non-Critical 
I 2 : for t := 1 to A — 1 do 

£ 3 : {y[i],s[t]) := {t,i) 

£ 4 : await s[t] ^ % V Vj i : y[j] < y\i] 
£5: Critical 
to: y\i ]--0 

Program Peterson(N). 



4.2 Verification of Peterson’s Algorithm 

In order to apply our method, we must first model the system by languages as 
defined in Definition 2. We encode a state of the Peterson system as a word of 
the form 
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s[l] s[2] s[N - 1] 

' 1 ^ \- 




y = 0 y=l y = 2 y = N — 1 y = N 



where all processes P[i] within partition k = 1 have their priority 

variable y[i] set to k. The leftmost process in partition k is the one signed in 
the signature s[fc]. Processes in the rightmost partition are the ones which are 
in the critical section. Note that there are N + 1 partitions, separated by N 
border markers. The set of global states can thus be given by the language 

X = G ( O + I )* : tl( I , m) = jl( O , w) > 1}. This is the language over the 

alphabet { O , | }, each of whose words has an equal number of “ | ” and “ O ” . 

The initial state can be described by the regular expression O : O* | *, in which 

all processes have their priority variable y set to 0 and thus are located in the 
first partition. 

The transition relation p can be specified using five length-preserving rewrite 
rules: Pi, P 2 i Pzi Pi and pij^ defined as follows: 



pi : xO \ y 1 -^ x \ Oy 


where x e O*, yG( -k O) 


p2 : ^lOlyi-^^ Oj/ 


where *€( -kO)*,J/G * 


p 3 ■ xO O \y 1 -^ xO \ Oy 


where x, y € ( | -k O )* 


p4 : X O 1-^ O X 


where a; G ( + O )* 


pid ■■ X 1-^ X 


where a; G ( + O )* 



The first rewrite rule states that a process can move unconditionally from the 
first partition to the second partition. This corresponds to increasing the priority 
variable y from 0 to 1. The second rule allows a single process within a parti- 
tion can move to the next partition provided that all the partitions to it’s right 
are empty. This corresponds to the situation where the process has the highest 
priority. The third rule allows a process to move to the next partition provided 
that there is a process to its left in the same partition. This corresponds to the 
situation where the process is not the last to sign in the signature correspond- 
ing to its current priority. The fourth rule describes an exit from the critical 
section, while the last rule captures the stuttering step. 

Performing forward regular model checking, we obtain the following set of 
reachable states: 



R: 0*|(0 + |)*0*|* -k 0*|(0+D* 1+0 I*. 

The interpretation being: to the right of any empty partition ( | | ) there can 

be at most one process ( O ). The negation of mutual exclusion is described by 

a; O O , which represents the situation in which there are at least two processes 
in the critical section. This has a non-empty intersection with the set of reachable 
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states, given by: 0*|(0+|)*0*0 O - Thus, we are unable to verify the Peterson 
algorithm using purely regular model checking. 

However, when we express the negation of the property by an Cdpda- 

language which can be described by the intersection (a; O O ) H y, where y re- 
quires an equal number of sticks and stones, we obtain an empty intersection. 
Thus, by mixing regular with context-free languages, we are able to verify mu- 
tual exclusion for the Peterson algorithm for an arbitrary number of processors, 
whereas using purely regular model checking, we failed to do so. An elaborated 
example and further details can be found in the full paper. ^ 

5 SMC Using a Cascade Product Idpda o dfa 

In this section we seek to find a subset of Cdpda which is adequate for symbolic 
model checking. Given a single state dpda M we define the class Cdpda-m- We 
provide a semi-algorithm for computing projection for this class, and an efficient 
algorithm to decide equivalence. This class, in addition, is closed under all basic 
operations required in order to be adequate for symbolic model checking. 

5.1 The Classes jCdpua—m 

We concentrate on deterministic pushdown automata with no epsilon transitions, 
DPDAs [19]. We often consider single-state dpdas with an empty set of accepting 
conditions, to which we refer as a Idpda. A Idpda M = (17, {g}, q, T, T, Z\, 0) 
can be represented by the quadruple {S, F, T, A) where A : E x F ^ Com{F). 
The following definitions are needed for the sequel. 

Definition 3. ( cascade product ) 

Let R = {V X F,S,so,S,F) be a dfa and (j) : V ^ S & substitution mapping 
each letter of P to a letter in E. The cascade product M R is the dpda 
(P, S, So, F, T, p, F) where p(s, cr, z) = {6{s, {a, z)), A{(j){a), z)). 

Definition 4. ( stack- consistent with M ) 

A DPDA A is said to be stack- eonsistent with M {M -consistent) , if there exist a 
substitution (j>:V ^ E and a dfa R= {V x F, S, Sq, S, F) such that A = Mo^R. 
We say that A is M -consistent w.r.t (with repect to) </>. 

Let A = M R. A run tta of A on the word w = a\a 2 • • • can be 
decomposed into two runs: a run ttm of M on and a run tt/j of R on w. 



(Tl/xi a'ilx'2. o-nixn 



7TA : 


(so,7o) 


1 

4>{(Ti)/xi 

1 


■ (si,7i) 


1 . 

4>{(T2)Ix2 

1 


.. 1 

<f>(<Xn)/xn 

1 


7Tm : 
PR ■■ 


7o 

So 


1 

cti,T(7o) 

1 


7i 

Si 


1 

o-2,T(7i) 

1 . 


1 

o-n,T(7„„i) 

.. 1 



^ the full version of the paper can be found in www.wisdom.weizmann.ac.il/dana 
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The decision upon the stack command at the i-th step xi is governed only by 
the Idpda M, which is not aware of the state st of R. Yet, the dfa R can look 
at the top symbol on the stack of M (T ( 7 ) denotes the top symbol of stack 7 ) . 
Automata R and M move simultaneously: when A reads a letter ai G V, the 
DFA R makes a move as if it was reading the pair (cTj, T(7i_i)), where 7^-1 is 
the current stack of M, while M makes a move as if it was reading (j){ai). The 
automaton A accepts a word w if, when the run terminates, R is in state s„ G F. 

An automaton A = {V, S, sq, F, T, p, F) which is M-consistent w.r.t 4> can be 
characterized by the pair {R, (f>) where R is the dfa such that A = M R. 

Definition 5. ( The M -stack consistent class, Cdpda-m ) 

Given a Idpda M, we define the M-consistent class Cdpda-m to be the class 
of languages that are accepted by some M-consistent dpda. 

The M-consistent automaton A can be viewed as a case in which the automaton 
has been decomposed into a stack-manipulator, which behaves exactly like M 
in its decisions about the stack transformations, and a finite-state controller R 
which affects the selection of the next state. Two M-consistent automata Ai 
and A 2 share the same stack-manipulator and may differ in their respective 
finite-state controllers, as well as in their respective substitutions. 

Claim. The class Cdpda-m is effectively closed under complementation, lifting 
and intersection with a regular language. 



Claim. Equivalence and Emptiness are effectively decidable for the class 

h^DPDA-M- 



5.2 Computing Projection 



Viewing the claims above, for any Idpda M the class Cdpda-m satisfies re- 
quirements 1,2, 3,4 , 6 and 7. Thus, if the class is also effectively closed under pro- 
jection, then it is adequate for symbolic model checking (in collaboration with 
the CpA class). Below, we provide a semi-algorithm for computing projections. 

Let A be a dpda which is M-consistent w.r.t ( j ), and let R be A’s characteristic 
DFA. For simplicity we assume the input alphabet of A is A x A, where A is the 
input alphabet of M. Also, we assume A is M-consistent w.r.t (j) and we 
wish to calculate the projection of C{A) on its first coordinate. That is, we would 
like to compute another dpda A over the alphabet A which is M-consistent w.r.t 
the identity relation and accepts the projection of A on its first coordinate. 

We claim that procedure Project described below calculates the correct pro- 
jection (unless it aborts). The procedure can be explained as follows. To simplify 
notations, we present p as a relation, a subset of S' x A x A x Com{F) x S instead 
of a function from SxAxAtoSx Com{F). 
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Procedure Project 

Input: a dpda A = {S x S , S, so,r, _L, p,F), a positive integer k 

Output: a DPDA A = {S, S', So, F, -L, p, F) 

1. Annotation: 

A = {E X E, S, So, F, _L, p, F) := Annotate{A, k) 

2. Projection: 

A := {E, S, So, F, -L, p, F) where S = S, s'o = So, F = F and 

p = {(s, 2 i,ai,a:i, s') : 3z2,ct 2,X2 : {s, zi, Z 2 , cri, a 2 , xi, X 2 , s') G p} 

3. Determinization: 

R := {E X F, S, So, (5, F) where S{s, {a, z)) = s' 3x : (s, z, a, x, s') G p 
R = {E X F,S,S,5,F) := Subset _Construction{R) 

Return A := M R 



In Phase 1 procedure Project calls procedure Annotate described below. Pro- 
cedure Annotate guesses how the Idpda M will operate when looking at the first 
coordinate instead of the second. For each edge (s, s') labeled by {[ll],Z2,X2)'^ 
the procedure aims to find all possible stack letters zi and stack commands xi 
such that if M looks at the first coordinate then, when moving from state s to 
s', it will have Zi on the top of the stack and decide on the stack command Xi. 

For each state, it saves an information describing the difference between the 
actual stack of M and the “guessed stack” (the stack of M if it was looking 
at the first coordinate). For this it uses the notation {(3i,f32) with the intention 
that if the maximal common prefix of the actual stack and the guessed stack is 
w then the “guessed stack” is described by wPi and the actual stack is described 
by wP2- An original state s of A may appear more than once in A, each time 
labeled with a different notation (/3i,/32). 

Note that Z2 must be the top symbol of wP2- The procedure will choose zi 
to be the top symbol of wf3i. The stack command x\ is then determined by 
A{ai,Zi). Assuming that the difference between the guessed stack and actual 
stack in state s is (/3i,/32), and the stack commands for the guessed and actual 
stack are x\ and X2, we can compute the new difference (/di,/?^) for state s'. Let 
7i and 72 be the result of applying x\ and X2 to the stack contents w(3i and 
ru/32 respectively. Given w' is the maximal common prefix of 71 and 72, then 
/3( = ^ilw' and = '^2lw' (where u/v denotes the right division of m by f). 

To guarantee termination, the procedure uses the second parameter — a 
bound k £ N. The procedure aborts if the length of either /?i or /?2 exceeds 
k. It may be the case that the procedure decides to abort when exploring an 
unreachable edge. We say that an edge (s, s') labeled by {[ll],Z2,X2) where s is 
annotated (/3i,/32) is unreachable if there is no prefix w such that WP2 is in the 
reachable stack language of s. To avoid this situation, the procedure computes 
for each state s the reachable stack language of s. It uses a labeling function /reg 
that labels each state with its reachable stack language, £(s). When exploring 
the edge (s, s') the procedure checks that w/?2 is in the reachable stack language 
of s, given by r = /reg(s). The procedure updates the reachable stack language 

If (s, a, X, a, s') G p we say that the edge (s, s') is labeled by (a, z, x). 
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of s', /reg(s'), to contain the set of words obtained by applying X2 to a word in 
r whose top symbol is 22- 



Procedure Annotate 

Input: A = {E X E, S, so, F, _L, p,F), a positive integer k 

Output: A = {E X E,S,s'o,F,±, p, F) where S CS x (P-'= x F-*^) and 
pCS X E^ X F^ X Com{Ff x S. 

^0 '■= (■So,(e,e)) ; := ComputeStack_Language{A, so) 

p:=0; S := {^}; Q := {^} 

While Q / 0 do 
Pick 7 = (s, (/3 i,/ 32)) e Q 

For all {r,r' ,zi,Z2,xi,X2,ai,a2, f 3 i, [ 32 , P'l, [32) G ReachableTransitions s.t. 
(s, <Ji,a2,Z2,X2, s') G p and r = freg{s) repeat 
If l/^ll > ^ V \f32\ > k then abort 
?:= {s' M, P'2)) 

If s' ^ S then /reg(s') := r' else /reg(s') := /reg(s') U r' 
p := pU {(?, «i, 22,0-1, (T2,a;i, * 2 , 5 ')}; S:= 5 'U{s'}; Q := Q U {s'} 

end for all 
Q--Q\ {s} 
end while 

si) := {(s, (di,d2)) : s = so}; F := {{s,{Pi, P2)) : s G F} 

end procednre 



The complete relation between all involved components, is summarized as 
follows. Let zi,Z2 G r, 171,172 G E, Pi, P2, Pi, P2 G r*, xi,X2 G Com{r), 
r,r' G Ti-ir). {r,r' , zi, Z2,xi,X2,<7i,U2, Pi, P 2 , Pi, P 2 ) G ReachableTransitions if 
and only if the following holds: 

1. 3w G r* such that WP2 G r 

2. Zi = T{wPi) and 22 = T{wP2) 

3. xi = A{(7i,Zi) and X2 = A(cr2,Z2) 

4. P'^ = xi{wPi)/w' and P'2 = X2{wP2)/w' 

where w' = max-Common_prefix{xi{wPi),X2{wP2)) 

5 . r' = {x2{uZ2) I UZ2 G r} 

The search is conducted on-the-fly starting at the initial state. For the initial 
state both the actual stack and guessed stack are T and hence the initial state 
is annotated by (e, e). The reachable stack language of the initial state is com- 
puted by calling procedure ComputeStack-Languages (see for example [17]). 
Then, for each state s labeled /reg(s) = r and annotated by (Pi,P2), and for 
each out-going edge (s, s') labeled by ( [^3] , 22, *2), it annotates the state s' by 
(P'^,P'2), updates /reg(s') to contain r' and adds the annotation 2i,xi to the 
edge, where {r,r' , Zi, Z2,xi,X2,(7i,a2, Pi, P 2 , P[, P 2 ) satisfy the conditions of a 
reachable transition. 
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In Phase 2 procedure Project simply projects each edge on the first coordi- 
nate. From each edge on it eliminates the second component CT 2 of the bi-letter 
[^ 2 ]) the stack letter Z 2 and the stack command X 2 - 

In Phase 3, as a first step the procedure extracts from the pda A the finite- 
state automaton R. Note that R is non-deterministic (i.e. it is an nfa): from one 
state s there could be two outgoing edges (s, s') and (s, s") with the same label. 
This non-determinism is the result of the projection performed in the previous 
phase. Hence, as a second step, the algorithm applies the subset construction 
and obtains the dfa R. 

In the full version of the paper, we also consider the simpler case where the 
cascade product dpdaodfa is degenerate, i.e. the dfa does not look at the stack 
of the Idpda. In this case the language is given by a simple product of a dpda 
and a dfa. 

6 Conclusions and Future Work 

Our research tried to give an answer to the verification problem of parame- 
terized system, which cannot be answered using regular model checking. We 
presented three methods to handle such problems. All suggested methods have 
been successfully used to prove mutual exclusion of the Peterson algorithm for 
an arbitrary number of processes [23] and termination of a termination detection 
algorithm extracted from Ricart and Agrawalas’ algorithm [25]. 

It is left to study the relation ship between these methods. Is one stronger 
than the other or are they incomparable, and one works on some instances while 
the second works on other. 

In this paper we consider only safety properties. It is important to extend 
our methods to verify liveness properties as well. 

In order to apply regular model checking or any of the methods we have 
developed here to a given verification problem, the verification problem must be 
modeled by languages and bi-languages. In several cases, in order to model the 
system, we used an encoding which is an abstraction of the system. It is interest- 
ing to see if one can automatically verify the correctness of such an abstraction, 
or even more, automatically produce a correct encoding (abstraction). 
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Abstract. Recently, Forster [7] proved a new lower bound on proba- 
bilistic communication complexity in terms of the operator norm of the 
communication matrix. In this paper, we want to exploit the various re- 
lations between communication complexity of distributed Boolean func- 
tions, geometric questions related to half space representations of these 
functions, and the computational complexity of these functions in var- 
ious restricted models of computation. In order to widen the range of 
applicability of Forster’s bound, we start with the derivation of a gener- 
alized lower bound. We present a concrete family of distributed Boolean 
functions where the generalized bound leads to a linear lower bound on 
the probabilistic communication complexity (and thus to an exponential 
lower bound on the number of Euclidean dimensions needed for a suc- 
cessful half space representation), whereas the old bound fails. We move 
on to a geometric characterization of the well known communication 
complexity class C-PP in terms of half space representations achieving a 
large margin. Our characterization hints to a close connection between 
the bounded error model of probabilistic communication complexity and 
the area of large margin classification. In the final section of the pa- 
per, we describe how our techniques can be used to prove exponential 
lower bounds on the size of depth-2 threshold circuits (with still some 
technical restrictions). Similar results can be obtained for read-fc-times 
randomized ordered binary decision diagram and related models. 
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1 Introduction 

Linear algebraic techniques play a pervasive role in the study of computational 
complexity. In particular, several matrix functions related to rank and its ro- 
bustness under various changes to the matrix have been extensively used in 
two-party communication complexity, many models of circuits, decision trees, 
branching programs, and span programs. Often such functions arise from or are 
equivalent to nice, natural geometric questions about the matrices. This paper 
studies certain geometric realizations of real matrices (with no zero entries) and 
applies results about such realizations of explicit matrices to derive lower bounds 
in communication complexity, threshold circuits, and ordered binary decision di- 
agrams. 

The main mathematical problem we investigate is as follows. Let M G 
be a matrix with non-zero entries. We say M can be realized by a fc-dimensional 
linear arrangement if there are vectors Ux,Vy G for x G X and y GY such 
that for all (x,y) G X xY, sign{Mx^y) = sign{{ux,Vy)). The two functions of 
interest are i) the minimal dimension d{M) of M, defined to be the minimal k 
such that M can be realized by a fc-dimensional linear arrangement, and ii) the 
maximal margin"^ of M, defined to be the maximum over all realizations 

(of all dimensions) {u^,Vy : x G X,y gY} of Lj{Z]’^L 2 {vy) ■ x G X,y gY}. 
Here, L 2 denotes the Euclidean norm. 

The minimal dimension can be interpreted in terms of rank. It is easy to 
see that the minimal dimension of a matrix M G is the minimal rank 

of any real matrix M' G such that for all (x,y) G X x Y, sign(M^ y) = 

sign(M^ y). Note that the minimal dimension depends only on the sign-pattern 
of M but not on its actual (non-zero) entries. Hence, we can ask the following 
question: given a sign-pattern (a matrix with entries ±1), what is the smallest 
rank of a matrix that obeys this sign-pattern (agrees with the given matrix in 
sign)? Stated differently, the minimal dimension of a ± 1-matrix describes the 
robustness of its rank with respect to sign-preserving changes. 

We review here, briefly, some history of the problem. Paturi and Simon [14] in- 
troduced the model of unbounded error probabilistic communication complexity. 
They showed that the problem of estimating the minimal dimension of matrices 
with ±1 entries is essentially equivalent to estimating the unbounded error prob- 
abilistic communication complexity of the corresponding Boolean function. Alon, 
Frankl, and Rodl [1] showed that for almost all n x n matrices with ±1 entries, 
the minimal dimension is at least I7(n) implying that for most Boolean functions 
the unbounded error probabilistic communication complexity is asymptotically 
as large as it can be (linear in the length of the input). However, proving even 
a superlogarithmic lower bound on the minimal dimension of an explicit matrix 
remained a difficult open question. Recently, Forster [7] solved this long standing 
open question. He showed a general lower bound on the minimal dimension of a 

^ A simple argument shows that the number of dimensions can always be reduced 
to min{|A|,|y|} without decreasing the margin. A simple compactness argument 
shows that the maximal margin exists always. 
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matrix in terms of its operator norm. As a corollary, he derived a lower bound 
of y/n on the minimal dimension of an n x n Hadamard matrix. This implies 
a linear lower bound on the unbounded error probabilistic communication com- 
plexity of the inner product mod 2 function. Forster’s results [7] also include 
upper bounds on maximal margins of explicit matrices. The problem of maximal 
margins is motivated by the excellent empirical performance of maximal margin 
classifiers [6] (these are learning algorithms that calculate the hyperplane with 
largest margin on a sample and use that hyperplane to classify new instances) . As 
outlined in [4] , the status of the problem for maximal margins was quite similar 
to the status of the problem for minimal dimensions: although the maximal mar- 
gin of most matrices M G {— 1, -1-1}^’^^ is (provably) not substantially larger 
than the trivial margin (which can be shown^ to be max{|AT|“^/^, 
the question of proving such an upper bound for explicit quadratic matrices has 
been open. Forster’s result gives an upper bound on the maximal margin of a 
matrix in terms of its operator norm. For the Hadamard matrix, this general 
bound yields the best possible upper bound on the maximal margin (matching 
the trivial lower bound). 

In this paper, we first generalize Forster’s bound [7] on the minimal dimen- 
sion. The general bound applies to any real matrix with non-zero entries, not 
just ±1 matrices. We demonstrate this generality by showing strong bounds for 
a class of matrices for which Forster’s [7] old bound fails. 

Next, we give a characterization of the communication complexity class C- 
PP (analog of the well-known Turing Machine complexity class PP) in terms of 
maximal margins. We show that C-PP is exactly the class of languages whose 
communication matrices (with ±1 entries) have a maximal margin of 
where n is the length of the inputs given to the two processors. An interesting 
ingredient of our proof is a random projection technique introduced by Arriaga 
and Vempala [3] in the context of learning algorithms. A result due to Halsten- 
berg and Reischuk [10] characterizes C-PP in terms of a probabilistic one-way 
protocol that uses at most messages and achieves an error bound of 

2 -poiyiog(")^ Here, we give a characterization of C-PP in terms of a single pa- 
rameter that is directly related to the communication matrix. 

Our third result are lower bounds on depth-2 threshold circuits in terms of 
minimal dimension. Proving superpolynomial lower bounds for explicit functions 
when the threshold gates are allowed to use arbitrary weights is an interesting 
open question. Hajnal et al. [9] show an exponential lower bound for the inner 
product mod 2 function when all the weights used in a depth-2 threshold circuit 
are polynomially bounded. Here, we strengthen their result by showing that the 
restriction on the weights of the top gate can be removed. We use lower bounds 
on minimal dimensions of explicit matrices to derive exponential lower bounds 
for some explicit functions including inner product mod 2. In fact, our lower 
bounds are exponential when the depth-2 circuit has a threshold gate (with 

^ To achieve margin use the “trivial embedding” Ux € {0,1}^ and Vy G 

{ — such that Ux{i) = 1 iff i = a; and Vy(x) = Mx,y Margin is 

obtained analogously. 
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unrestricted weights) at the top and either threshold gates with polynomial 
weights or gates computing arbitrary symmetric functions at the bottom. Our 
results also generalize and strengthen the results of Bruck and Smolensky [5], 
Krause [11], and Krause and Pudlak [12]. 

Our last result (contained in the full paper for sake of completeness) is a 
(sort of “easy-to-get” ) lower bound on the size of randomized ordered binary 
decision diagrams (randomized OBDD’s).^ Using the lower bound on the mini- 
mal dimension of a Hadamard matrix, it is easy to specify an explicit function 
(containing the inner product mod 2 as a subfunction) that requires exponen- 
tial size randomized OBDD’s. Using a standard technique"*, it is not hard to get 
similar exponential lower bounds for even stronger non-uniform models of com- 
putation (like read-/c-times randomized OBDDs or various restricted versions of 
randomized oblivious branching programs). 



2 Definitions and Notations 

A matrix M G induces the function f{x,y) = and vice versa. 

Since we often consider the situation that a function f{x, y) is computed by two 
processors (one of them with x as local input, the other one with y as local input), 
we speak of a “distributed function”. We often blur the distinction between 
matrices and distributed functions. In other words, we interpret a distributed 
function as matrix (or vice versa) whenever we find it convenient. 

Notions and Facts from Linear Algebra. We assume some familiarity with linear 
algebra and matrix theory. S'"”* denotes the (n — l)-dimensional unit sphere, 
i.e., S"”* := {u G K" : L 2 {u) = 1}. The operator norm of a matrix M G 
is defined as follows: 



||M|[ :=max{T2(Mu):uGS"”*} (1) 

It is well known that for every matrix M the equalities ||M|| = ||M^|| and 
[[MM^II = ||M||^ hold. If M G is symmetric, then ||M|| coincides with 

the largest eigenvalue of M. If M G is orthogonal, then ||M|[ = 1. If 

M G has pairwise orthogonal columns, then ||M|| coincides with the 

Euclidean length of the longest row vector of M. 

Probabilistic Communication Complexity. A two-way probabilistic communica- 
tion protocol is a probabilistic algorithm for two processors TTo and ili that 

® An ordered binary decision diagram (OBDD) is a restricted model of a branching 
program that has been extensively studied both in complexity theory and in applica- 
tions such as automatic verification. The reader interested in more information about 
OBDD’s is referred to the book of Ingo Wegener [16], where also many pointers to 
the literature can be found. 

* hrst introduced by Alon and Maass [2], later extended by Krause and Waack [13], 
and further elaborated by different people (see [15], for instance) 
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computes a distributed function / : {0, 1}” x {0, 1}” — 1- {—1, 1}. Both processors 
have unbounded computational power. TIq sees only the first part, x, and 7Ti sees 
only the last part, y, of the input (x, y) G {0, 1}” x {0, 1}”. Obviously there has to 
be some communication between the two processors to calculate f{x, y) G {0, 1}. 
The processors can communicate by exchanging messages h G {0, 1}*. The com- 
putation takes place in rounds. In each round one of the processors is active, in 
odd rounds it is II q and in even rounds it is 77i. The active processor probabilis- 
tically (depending on the part of the input it knows and on the past messages) 
chooses a message according to the communication protocol. In the final round 
the active processor probabilistically chooses the result of the computation. For 
a one-way probabilistic communication protocol there are only two rounds: TTq 
randomly selects a message, sends it to ili, and Ui randomly performs the 
binary decision of either accepting the input or rejecting it. 

We say that a protocol computes the distributed function / : {0, 1}" x 
{0, 1}” — 1- {—1, 1} with unbounded error if for all inputs (x, y) G {0, 1}” x {0, 1}” 
the correct output is calculated with probability greater than (Since the slight- 
est edge over random guessing is already sufficient, this model is sometimes called 
the unbounded error model.) The length of a communication protocol is |"log 2 N ~\ , 
where N is the number of distinct message sequences that can occur in compu- 
tations that follow the protocol. The communication complexity PComm(/) of 
a distributed function / : {0, 1}” x {0, 1}” — >■ { — 1, 1} is the smallest length that 
any communication protocol that correctly computes / can have. 

We briefly note that Paturi and Simon [14] have shown that, in the un- 
bounded error model, one-way and two-way protocols have almost the same 
power. In fact, any probabilistic two-way protocol of length k can be converted 
into a one-way protocol with length at most fc -I- 1. Furthermore, the minimal 
dimension d{f) of a distributed Boolean function /(x, y) is closely related to its 
probabilistic communication complexity PComm(/). Paturi and Simon [14] have 
proven the following relation: 

riogd(/)l < PComm(/) < [logd(/)] -k 1 (2) 

We will show in this paper that the following quantity is related to the mar- 
gin: given a probabilistic protocol, let e(x, y) denote the difference between the 
probability that (x, y) is accepted and the probability that (x, y) is rejected. Thus 
f{x,y) = sign(e(x, j/)) for any probabilistic protocol that correctly computes /. 
The error bound of the protocol is defined as miiixex.yeY \s{x,y)\. The class 
C-PP consists of all functions / that can be correctly computed by a probabilis- 
tic communication protocol of length polylog(n) with error bound 
Halstenberg and Reischuk [10] have shown that the class C-PP does not change 
if we allow only one-way protocols. The class C-PP is one of the main issues in 
the bounded-error model for probabilistic communication complexity. 

3 Bounds on the Dimension and on the Margin 

We first briefly review some of the main results in [7] and [8]. Thereafter, we 
present a new lower bound on the minimal dimension that generalizes the old 
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bound from [7]. The generalized bound can lead to exponential lower bounds in 
cases where the old bound fails. 



Known Bounds. Recently the following lower bound on the minimal dimension 
was proven: 

Theorem 1 (Forster [7]). For each matrix M € {— 



d(M) > 



l|M|| 



Corollary 1 (Forster [7]). Let M G {—1, and M' G such that 

s\ga.{Mx^y) = sign(M' j,) for all x G X,y G Y. Then the following holds: 



PC„mm(M) > log , 



rank(M') > 



\\M\\ 



The lower bound on PComm(M) follows directly from Theorem 1 and (2). The 
lower bound on rank(M') follows from the well known fact that each matrix of 
rank k can be realized by a fc-dimensional linear arrangement and vice versa. 

As outlined in Forster [7], Theorem 1 allows to present concrete families of 
distributed Boolean functions whose minimal dimension grows exponentially in 
n. Thus, their probabilistic communication complexity grows linearly in n. Here 
is the example given in [7]: 

Example 1. Let ip„(x,7/) := (—1)^ where x,y G Z 2 , be the inner product 
mod 2 function. It is well known that the corresponding matrix H such that 
Hx.y = iPn(a^) y) is a Hadamard matrix. H has operator norm 2"/^ because it has 
orthogonal columns and each row vector has Euclidean length 2"/^. According 
to Theorem 1 and Corollary 1: 

d(ip„) > 2"/^ and PComm(ip„) > n/2 (3) 



Forster, Schmitt, and Simon [8] proved the following upper bound on the 
maximal margin: 



Theorem 2 (Forster, Schmitt, Simon). For each matrix M G with 

no zero entries: 



KM) < 



VW\-\\M\\ 

\Mx,y\) 



Note that Theorem 2 implies the following upper bound on the maximal margin 
of the inner product mod 2 function:^ 



M(ip„) < 2-”/^. (4) 

® This upper bound had been shown already by Forster [7]. He derived it from a 
theorem that is slightly weaker than Theorem 2. 
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A Generalized Lower Bound on the Minimal Dimension. Theorem 1 allowed us 
to derive an exponential lower bound on d(ip„) because the matrix induced by 
ip„, the Hadamard matrix, has 2" orthogonal columns. However, Theorem 1 may 
also fail to certify valid exponential lower bounds. For this reason, we present 
a generalized lower bound which brings more functions in the reach of our ma- 
chinery: 

Theorem 3. For each matrix M G with no zero entries: 



d{M) > 



\\M\\ 



min I I . 



Proof. Assume that there are vectors Ux,Vy G such that sign(M^^y) = 
sign((ux, Vy}) for all x G X, y G Y. Obviously we can assume that the minimum 
To.m.x^x,y^Y IS 1. It has been shown by Forster [7] that a given linear ar- 
rangement Ux,Vy can be modified such that Ux,Vy G and 'Yhx^x'^xuf. = 

I -X" I 

Now we have for all y € that 



^ ^ ^ 2 -p / ^ ^ 

/ ^ ^x^yi'Ux', Vy) ^ ^ ~ '^y [ / ^ 



>0 






\x^X 



UxU^ j ^ 



It is not hard to show® that 



(5) 



2 / \ ^ 

1^1 (^) < E <i^iiiA^f . 

^ ^ VsGX / 



Thus k > 

inus K ^ ii^ll . 



□ 



4 An Application of the Generalized Lower Bound 

We would like to demonstrate (by means of an example) the greater flexibility 
provided by Theorem 3. Let Fp be the prime field of characteristic p, let be the 
n-dimensional vector space over Fp, and let P„_i(Fp) be the {n— l)-dimensional 
projective space. Remember that the elements (projective points) in P„_i(Fp) 
are the 1-dimensional linear subspaces of F)(. We assume that the projective 
points are given by their homogeneous coordinates. We say that a projective 
point Q = {qi,...,qn) is orthogonal to a projective point Q' = (q[, . . . , q^f), 
denoted as Q-LQ', if diQi = 0- The Boolean function ORTp^„ : P„_i(Fp) x 
Fn-l(Fp) -G { — 1, 1} 



ORTp,„(Q,Q0 



1 , liQAQ' 

— 1, otherwise 



( 6 ) 



® The first inequality follows directly from (5). The second inequality follows from 
some algebraic manipulations that are found in the full paper. 
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is the indicator function of the orthogonality relation. In the full paper, we show 
that the operator norm of ORTp^„ is too small for certifying an exponential 
lower bound on the minimal dimension with Theorem 1 . In contrast to this, the 
following can be shown by means of Theorem 3 and Theorem 2 (and by means 
of a nice trick from [13]): 

Theorem 4. Asymptotically, the minimal dimension and the maximal margin 
of ORTp^n are hounded as follows: 

d(ORTp,„) >p”/ 2 -i(l-o(l)) 



- o(l)) < Ai(ORTp,„) < 



2p(n-l)/2 



(1 + 0(1)) 



Proof. Our proof strategy (borrowed from [13]) is to define a function (matrix) 
ORTp „ such that ORTp_„ and ORTp „ satisfy the requirements of Theorem 3, 
and such that the columns of ORT^ ^ are orthogonal (which makes ||ORTp „|| 
small and, therefore, the lower bound on d(ORTp^„) large). Details follow. 

For 

l _|_ p ("- 2)/2 

w+:=p-l and w. := . 

ORTp „ is given by 



ORT;_„(g,Q0 := 



w+, if Q±Q' 
—W-, otherwise 



Obviously, |ORTp „(Q, Q')| > 1 and sign(ORTp_„(g, Q')) = sign(ORTp_„(Q, Q')) 
for all Q,Q' ■ We may therefore apply Theorem 3 and Theorem 2 to ORTp „ (in- 
stead of ORTp_„). It has been shown by Krause and Waack [13] that ORT^ „ 
(viewed as matrix) has orthogonal columns. Its operator norm coincides therefore 
with the Euclidean length of one of its row vectors (all of which have the same 
length). The following facts are well known. N := |P„_i(Fp)| = (p” — l)/(p — 1). 
Furthermore, each projective point is orthogonal to N+ = (p"“^ — l)/(p — 1) 
projective points and non-orthogonal to the remaining N_ = N — 
projective points. We conclude that 



llORT^Jj = ^N+wl + N.wf = + 1 , 



According to Theorem 3: 



d(ORTp,„) > 



yiv+v 

I|ort;,j] 



p"/2 _ 1 

P- 1 



According to Theorem 2 and by straightforward computation, 



p(ORTp_„) < 



]ORT' II ■ yiV 



(EQ'|ORT;.„(Q,goi)' 



p »/2 + 1 

2p«-l +pn/2 _ I' 



The trivial margin (obtained from the trivial embedding that was briefly de- 
scribed in the introduction) is = ((p — l)/(p" — 1))^^^- n 
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5 Large Margins and Bounded Error 

Recall that a family (/„) of Boolean functions /„ : {0, 1}” x {0, 1}” — >■ {—1, +1} 
belongs to C-PP if there exists a probabilistic one-way protocol that transmits 
at most polylog(n) bits (uses at most 2 P°biog(n) messages) and achieves error 
bound 

The goal of this section is to express membership in C-PP in terms of only 
one parameter: the maximal margin. Here is the main result: 

Theorem 5. (/„) G C-PP /x(/„) > 2-P°iyi°s(") . 

Theorem 5 follows directly from Lemmas 1, 2, and 3 below. Lemma 2 makes 
use of the random projection technique that was introduced by Arriaga and 
Vempala [3] . Lemmas 1 and 3 are implicitly proven in the paper of Paturi and 
Simon [14] on probabilistic communication complexity with unbounded error. 
Since they do not explicitly keep track of error bounds (not to speak of margins), 
a brief sketch of these proofs is given in the full paper. 

Lemma 1. Each probabilistic one-way protocol for / that uses at most N mes- 
sages and achieves error-bound s can be converted into an N -dimensional linear 
arrangement that realizes f with margin /i > e/'/N. 



Lemma 2. Each linear arrangement (of arbitrarily high dimension) that realizes 
f with margin pL can be converted into an |"(12(n-|- 1)/ pi)'^~\- dimensional linear 
arrangement that realizes f with margin /i/2. 



Proof. The following result (whose proof can be found in [4]), is based on the 
technique of random projections (from Arriaga and Vempala [3]): 

Let w,x gW' be arbitrary but fixed. Let R = (Rij) be a random {k x r)-matrix 
such that the entries Rij are i.i.d. according to the normal distribution fV(0, 1). 
Consider the random projection ur := -^{Ru) G K.* for all u G Then the 
following holds for every constant p, > 0: 



Pr 

R 



\{wr,Xr) - (w,x)| > I {L2{w)‘^ -\- L2{xf) 



< 4e-^"'=/8. 



This result can be used in the obvious way to guarantee the existence of a 
random projection that maps an r-dimensional linear arrangement that realizes 
/ with margin /i to a fc-dimensional linear arrangement that realizes / with 
margin /i/2. The dimension k must exceed a critical threshold that depends 
on /i and n (but does not depend on r). If the computations are carried out 
carefully, it turns out that k := ](12(n-|- l)//i)^] is sufficiently large. We omit 
the (somewhat tedious) computations in this abstract. □ 



Lemma 3. Each N -dimensional linear arrangement that realizes f with margin 
/i can be eonverted into a probabilistic one-way protocol that uses at most 2N 
messages and achieves error bound e > pL/y/N. 
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6 Minimal Dimension and Computational Complexity 

In this section, we show how lower bounds on the minimal dimension can be 
used to derive improved lower bounds on the complexity of Boolean functions in 
various non-uniform models of computation. In this brief abstract, we focus on 
depth-2 threshold circuits. The reader interested in exponential lower bounds on 
the size of randomized OBDD’s and related models is referred to the full paper. 

The main result in this section states that (loosely speaking) Boolean func- 
tions with “high” minimal dimension cannot be computed by “small” (some- 
what technically restricted) depth-2 threshold circuits. Part a) of this theorem 
strengthens the lower bound of Hajnal et al. [9] and Part b) generalizes and 
strengthens the results of Bruck and Smolensky [5], Krause [11], and Krause 
and Pudlak [12]. Note that for technical reasons we assume that the top thresh- 
old gate is ±1- valued. The threshold gates on the bottom level are {0, 1}- valued. 

Theorem 6. Let (f„) be a family of distributed Boolean functions fn ■ {0, 1}" x 
{0,1}" — >■ {— 1,-|-1}. Suppose (fn) is computed by depth-2 threshold circuits in 
which the top gate is a linear threshold gate (with unrestricted weights). Then 
the following holds: 

a) If the bottom level has s linear threshold gates using integer weights of abso- 
lute value at most W , then s = Q ^ particular, s = Q for 

fn — iPn; and s = f2 ^ ^ ^ for fn\\ogp] — OKTp 

b) If the bottom level has s gates computing symmetric functions, then s = 

Q g = Q ^ fou /„ = „ , und s = f2 ^ 

for fn\\ogp] ORTp yj . 

Note that the specific bounds for ip„ and ORTp,„ follow immediately from the 
general bound for /„ by applying (3) and Theorem 4. 

The proof of this theorem is based on Theorem 1 and on ideas from the 
results cited above. We start with two lemmas. 

Lemma 4. Let G : (0, 1}" x (0, 1}" — >■ (0, 1} be a threshold function where for 
x,y e {0, 1}", G{x,y) = 1 iffYli^i + S”=i ^ T for weights Ui, y, G 
Z. Then, G (viewed as a matrix) has rank at most min|^”_j^ Yffi=i IAII + 1- 

Proof. W.l.o.g. let ^ Yfi=i lAI- Let Omin (^max) be the minimal (max- 
imal) value taken by OHXi as x ranges over all possible inputs in (0, 1}". As 

the weights are integers, this sum takes at most Omax — cTmin + 1 distinct values. 
We partition the rows of G according to the weight contributed by x and, within 
each block of these rows, we partition the columns into two groups depending 
on whether or not the weight contributed by y together with that of any row of 
that block exceeds the threshold p, or not. Specifically, define the following sets 
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of entries of G for all a such that Omin < o; < Omax^ 



n 

Sa ,0 := {(x, y) : ^ aiXi = a and 

Z=1 



n 

2 = 1 



n 

Sa,i := {(x, y) : ^ aiXi = a and 
2=1 



n 

X! PiVi ^ M 
2 = 1 



a}, 



a}. 



Let Ga,o and Ga,i be (disjoint) submatrices of G defined by the entries Sa,o 
and Sa,i respectively. It is clear that Ga,o is an all-0 matrix and Ga,i is an all-1 
matrix for any a. Furthermore G = X)a(^a,o + Ga,i)- Hence, by subadditivity 
of rank we see that the rank of G is at most the number of distinct values taken 
by a. The latter is bounded above by «max — cTmin + 1 < X^r=i 1*^*1 + 

Note that the same proof goes through even if we generalize the definition of 
G by setting G{x, j/) = 1 iff X^r=i + Ym=i PiVi ^ ^ arbitrary subset 

T of Z. Specifically, we have the following corollary of the proof: 

Corollary 2. Let G : {0, 1}" x {0, 1}” — 1- {0, 1} be a symmetric function in 
the sense that its value depends only on 'Then G (viewed as a 

matrix) has rank at most n -I- 1. 



Lemma 5. Let f : {0, 1}” x {0, 1}” — >■ {— 1,-|-1} be a Boolean function com- 
puted by a depth-2 threshold circuit G with the top gate using unrestricted weights 
and each of the bottom gates using weights of absolute value at most W. Then 
there is a real matrix F such that sign{F{x,y)) = f{x,y) for all x,y € {0,1}" 
and rank(F) = O(snW), where s is the number of bottom gates. 

Proof. Let the top gate of G have weights (j)i,...(j)a and threshold ((q. Hence 
we can write f{x,y) = sign(^®^j^ (piGi{x,y) — 4>o), where Gi are the functions 

computed by the bottom gates. Define the matrix F := H h <psGs — 4>oJ, 

where J is the all I’s 2" x 2” matrix. It is clear that f{x,y) = sign{F{x,y)) for 
all x,y € {0,1}”. Moreover, rank(F) <1-1- X)i=i ^ 1 + s(l + nW) 
using Lemma 4. □ 

Using Corollary 2, one can similarly prove that if / is computed by a depth-2 
circuit with a threshold gate at the top and s symmetric gates at the bottom, 
then there is a matrix F of rank 0{sn) that sign-represents /. 

Proof (of Theorem 6). By Lemma 5, if a depth-2 threshold circuit computes 
fn{x,y), then there is a matrix Fn such that sign(F„(x, y)) = sign(/„(cc, y)) and 
rank(F„) = OfsnW). On the other hand, rank(F„) > d(/„) (because there is 
always a rank(F}j)-dimensional linear arrangement for F^, which is then also a 
linear arrangement for /„). Comparing the upper and lower bounds on rank(U), 
we get snW = l7(d(/„)). This proves part a) of the theorem. Part b) is proved 
similarly by means of Corollary 2. □ 
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Abstract. In this paper we focus on the problem of designing very fast 
parallel algorithms for constructing the upper envelope of straight-line 
segments that achieve the 0{nlog H) work-bound for input size n and 
output size H. Our algorithms are designed for the arbitrary CRCW 
PRAM model. We first describe an 0(logn • (logi7 -I- log log n)) time 
deterministic algorithm for the problem, that achieves 0{n\og H) work 
bound for H = l7(logn). We present a fast randomized algorithm that 
runs in expected time 0{logH ■ log log n) with high probability and does 
0(n\ogH) work. For logiif = l7(loglogn), we can achieve the running 
time of 0(\ogH) while simultaneously keeping the work optimal. We 
also present a fast randomized algorithm that runs in 0(logn/logfc) 
time with nk processors, k > log^d) The algorithms do not assume 
any input distribution and the running times hold with high probability. 



1 Introduction 

The upper envelope of a set of n line segments in the plane is an important 
concept in visibility and motion planning problems. The segments are regarded 
as opaque obstacles, and their upper envelope consists of the portion of the 
segments visible from the point (0, -boo). The complexity of the upper envelope 
is the number of distinct pieces of segments that appear on it. If the segments 
are nonintersecting, then the complexity of their upper envelope is linear in n. 
On the other hand, if the segments are allowed to intersect, then the worst 
case complexity of the upper envelope increases to 0{na{n)), where a{n) is the 
functional inverse of Ackermann’s function [2]. 

There exists an O(nlogn) time algorithm to compute the upper envelope of 
n line segments, and this is worst case optimal [15]. However, this is true only if 
the output size, i.e., the number of vertices (or the edges) of the upper envelope, 
is large. More specifically, the time-bound of O(nlogn) is tight when the ordered 
output size is 17 (n). However if the output size is small then we should be able 
to do much better. The output-size of a problem is an important parameter in 
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measuring the efficiency of an algorithm and one can get considerably superior 
algorithms in terms of it. There exists an 0(nlog H) algorithm for the prob- 
lem [17], where H is the output size, which implies a linear time algorithm for 
constant output size. 

We are aiming at designing an output size sensitive parallel algorithm that 
speeds up optimally with output size in the sub-logarithmic time domain. 

In the context of sequential algorithms, it has been observed that the up- 
per envelope of n line segments can be computed in 0(na(n) logn) time, by a 
straight forward application of divide-and-conquer technique. Hershberger de- 
scribes an optimal O(nlogn) algorithm by reorganizing the divide-and-conquer 
computation [15]. 

Clarkson describes a randomized 0{na{n) log n) algorithm for computing a 
face in an arrangement of line segments [8] . The upper envelope of line segments 
can be viewed as one such face. Guibas et al. [11] , gave a deterministic algorithm 
which computed a face for a collection of line segments in 0{na^(n) logn) time. 

The process of gift wrapping amounts to locating an extreme segment of 
the upper envelope and then “walking” along successive segments of the en- 
velope which are defined by the points of intersection of the lines in the given 
arrangement. This results in a simple 0(nH) time algorithm. Franck Nielsen and 
Mariette Yvinec presents a deterministic sequential output-sensitive algorithm 
that computes the upper envelope in 0{nlog H) time [17]. Their algorithm is 
based on the marriage-before-conquest paradigm to compute the convex hull of 
a set of fixed planar convex objects, which they also apply to compute the upper 
envelope of line segments. They use the partitioning technique of Hershberger to 
get an 0(nlog H) time for upper envelope. They claim that their algorithms are 
easily parallelizable onto EREW PRAM, following the algorithm of S. Akl [3]. 
This implies a parallel output-sensitive algorithm which is optimal for number 
of processors bounded by O(n^), 0 < z < 1. 

Recently Wei Chen and Koichi Wada gave a deterministic algorithm that 
computes the upper envelope of line segments in 0(log n) time using 0(n) pro- 
cessors [7] . If the line segments are nonintersecting and sorted, the envelope can 
be found in O(logn) time using 0{n/ logn) processors. Their methods also im- 
ply a fast sequential result: the upper envelope of n sorted line segments can be 
found in 0(n log logn) time. 

We present algorithms whose running times are output-sensitive even in the 
sub-logarithmic time range while keeping the work optimal. For designing fast 
output-sensitive algorithms, we have to cope with the problem that the output- 
size is an unknown parameter. Moreover, we also have to rapidly eliminate input 
line segments that do not contribute to the final output without incurring a high 
cost - see Gupta and Sen [13] for a more detailed discussion. 

We present one deterministic and two randomized algorithms that construct 
the upper envelope of n line segments. Both our randomized algorithms are of 
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Las Vegas type, that is we always provide a correct output and the bounds hold 
with high probability^. 

We first describe a deterministic algorithm for the problem that takes 0(log n- 
(logiL + loglogn)) time using 0{n/logn) processors. The algorithm is based on 
the marriage before conquest approach and we use the ideas presented by Neilsen 
and Yvinec [17] to bound the size of the sub-problems. Our algorithm achieves 
0{nlog H) work bound for H = O(logn). 

Next we present fast randomized algorithm. The expected running times 
hold with high probability. The algorithm runs in O(logiJ) expected time using 
n processors for H > log^ n, e > 0. For smaller output sizes the algorithm has an 
expected running time of 0{logH ■ log log n) keeping the number of operations 
optimal. Therefore for small output sizes our algorithm runs very fast. The 
algorithm uses the iterative method of Gupta and Sen [13]. 

We also describe a randomized algorithm that solves the problem in 
0(logn/logA:) time with high probability using nk processors for k > log^*'^^ n. 
This algorithm is based on the general technique given by Sen to develop sub- 
logarithmic algorithms [21]. 

2 Deterministic Algorithm for Upper Envelope 

Our algorithm is based on the Marriage-before-conquest technique and it uses 
the ideas presented by Neilsen and Yvinec [17] to bound the size of the sub- 
problems. 

Let S be the set of n line segments. Let its upper envelope be denoted by 
UE(S'). Then, if we know that there exists a partition of S into k subsets 

such that each subset Pi, for i G [1, fc] is a set of non-overlapping line segments, 
then in the marriage-before-conquest procedure we can bound the size of the 
sub-problems by {n/2 + k). Thus we transform the set S into a set T of line seg- 
ments partitioned into subsets, each subset consisting of a set of non-overlapping 
line segments, such that UE(S') = UE(P). We now apply the marriage-before- 
conquest procedure on the set T. To compute the set T, we partition the set S 
using a partition tree and use the communication of J. Hershberger to make the 
size of T linear. With a good estimate of k, we run the marriage-before-conquest 
procedure on the set T (the number of stages of this procedure depends on k), 
and reduce the total size of the problem to n/logn, after which the problem is 
solved directly. 

2.1 Transforming the Set S of Line Segments into the Set T 

The Vertical Decomposition. Let us call two line segments to be x-separated 
if there exists a vertical line such that the two line segments lie entirely on the 
different sides of the vertical line. The vertical decomposition will give rise to a 
set of x-separated line segments. 

^ The term high probability implies probability exceeding for any predeter- 

mined constant c where n is the input size. 
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Let Q = , ^m} be a subset of S. Through each vertex of the upper 

envelope of Q, we draw a line parallel to y-axis. These parallel lines induce a 
decomposition of the line segments of Q into smaller line segments called the 
tiny line segments. We only keep the tiny line segments that participate in the 
upper envelope. Note that the part of a line segment not contributing to such 
an upper envelope will not contribute to the final upper envelope. The tiny line 
segments are a;-separated i.e. they are non-overlapping. The number of these tiny 
line segments is equal to the complexity of the upper envelope. Each tiny line 
segment is defined by a single line segment. 



The Partition Tree. To generate the set T from the set S, we group the line 
segments of the set S into k groups and compute the vertical decomposition of 
each group. Thus the tiny line segments formed in this decomposition form the 
set T, which is partitioned into k subsets, each consisting of non-overlapping 
segments. However such a grouping technique does not guarantee that the size 
of the set T i.e. the total number of such tiny line segments, is linear. 

We use the technique of J. Hershberger to group the line segments of S. The 
idea is to create groups so that the size of the vertical decomposition of each 
group remains linear. Let p be a given integer. We first construct the interval 
tree of depth logp on the endpoints of the line segments, as presented by Neilsen 
and Yvinec [17]. Next we load this tree with the line segments, by allocating n 
line segments to the nodes of this tree, according to the lowest common ancestor 
of their end points. By the communication of J. Hershberger [15], the upper en- 
velope of the line segments belonging to an internal node is linear in the number 
of line segments (He proves this from the fact that line segments belonging to a 
node cross a vertical line) and the upper envelope of the line segments belong- 
ing to different nodes at the same internal level is also linear in the number of 
line segments since they are a;-seperated. We group the line segments of each 
internal level of the interval tree into groups of size p forming Oinjp logp) 
groups. Also, two line segments belonging to different leaves of the tree are x 
separated. We group the line segments belonging to the leaves by picking one 
segment from each leaf and forming a group, obtaining additional n/p groups 
each of size p. Compute the vertical decomposition of the upper envelope of each 
group, obtaining 0{2n/p + \ogp) groups each consisting of 0{p) x-seperated tiny 
line segments. The set T is the union of all these tiny line segments. 

Thus we obtain a new set T, which is partitioned into 0{2n/p+logp) subsets, 
each consisting of p x-seperated tiny line segments, such that UE(T) = UE(5'). 
We choose p such that plogp < n. 

Lemma 1. For a fixed integer p, the set T partitioned into 0(2njp F logp) 
subsets each of p x-seperated line segments from the set S, such that UE(S) = 
UE(T) can he generated in 0(logn • logp) time using 0(n/logn) processors. 

Proof. Computing T requires computing the median, the lowest common ances- 
tor of the endpoints, and 0{2n/p + logp) upper envelopes each of size 0{p). All 
these can be done in the required bounds. □ 
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Remark 1. In fact the same procedure runs in expected O(logp) time using n 
processors with high probability. 

2.2 The Marriage-Before-Conquest Procedure 

We assume that the partition of the set T into k subsets such that each 

subset Pi, for i G [1,^] is a set of non-overlapping line segments, is given. Call 
the two vertical lines passing through the end points of a line segment as the 
walls. Let W be the set of walls and denote by \W\ = 2n its cardinality. We 
define a slab as the portion of the Euclidean plane between two walls. The 
upper envelope UE(S') can be described as the x-ordered sequence of edges. 
The Marriage-before-conquest (MBC) procedure computes a subsequence of the 
edges of UE(5') included in the slab B formed by the extreme right and extreme 
left walls. We say that a line segment spans a slab, if it intersects the slab but 
does not have a wall inside the slab. 

Find the median Wm of the walls W. Compute the edge b of the upper 
envelope intersecting w^. Split W into two subsets Wi (Resp. IV 2 ) consisting of 
walls that are to the left (Resp. right) of the edge b and not intersecting it. Call 
the slab formed by Wi (Resp. W 2 ) as Bi (Resp. 82 ). The line segments to be 
considered in a slab are those that have a wall inside the slab and those that 
span the slab. If a slab has no walls then stop, retain the line segments spanning 
such a slab (we will take care of these line segments later), else recurse. 

Observation 1. At every stage, each sub-problem has at least one output edge. 

The Analysis. Our partitioning scheme guarantees that each sub-problem has 
at most one spanning line segment from a partition (since the line segments 
belonging to a partition are x-separated) . Thus for each sub-problem, there 
can be at most k spanning line segments. Thus the size of the sub-problems is 
bounded by 0 (nj2 P k), where k = 0{2njp + logp) = 0 {n/p) for plogp < n 
from Lemma 1. 

Lemma 2. From observation 1, if we choose the integer p such that Hlogn < p 
the size of the problem reduces to n/logn after O(logp) stages of the MBC 
procedure. 

2.3 The Main Algorithm 

The algorithm works in two phases: The estimation phase. During this phase 
we find a good estimate of p {p < (iJlogn)^ and plogp < n). We start with 
a constant value of p, run MBC and see if the size of the problem reduces to 
n/logn after logp stages. If this happens for small p our job is done. However 
otherwise, from Lemma 2 we know that once p is big enough that is p > H log n, 
we are guarenteed this to happen. Thus p is always < (Hlogn)^. For every p we 
also check if plogp > n (n < plogp, p < (iJlogn) ,=J> H > n^). If so, then use 
an 0(n, log n) algorithm to solve the problem directly and stop. The terminating 
phase. During this phase we solve the problem (of size n/logn) directly. 
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The Analysis. For a fixed p from Lemma 1, the set T can be generated in 
0(logn • logp) time using 0(n/ log n) processors. The total time spent during 
the MBC procedure is 0(logn • logp) using 0(n/ log n) processors. 

Let pe denote the estimate of p, then pe < {H log n)^. If pi is the estimation 
of p, then the total time spent in this phase is < O(lognX)logPi) = 0(logn • 
log Pe ) = O (log n • (log H + log log n) ) . The problem of size n / log n can be solved 
in O(logn) time. Thus we have the following theorem. 

Theorem 1. The upper envelope of n line segments can he constructed in 
0(logn • (logiL + log log n)) time using 0{n/ \ogn) processors in a determin- 
istic CRCW PRAM. 

3 The Randomized Optimal Algorithm 

In this section we present a randomized algorithm using 0{n) processors. The 
algorithm is an iterative one rather than recursive. The underlying algorithm 
is similar to the algorithm described for the planar convex hulls by Gupta and 
Sen [13]. Hence the entire analysis of their algorithm goes through in our case 
also. However here we detail out the steps that are specific to our problem. We 
assume for simplicity that no more than two line segments intersect at the same 
point and the abscissa and ordinate of the end points of no two line segments 
and that of points of intersection are the same. To take care of the gaps we also 
introduce an extra line segment L lying below all the line segments. 

The idea is to construct the upper envelope of the random sample R of the 
set S of staright line segments and filter out the redundant segments that do 
not contribute to the upper envelope of S. We pickup a sample of line segments 
of constant size, compute its upper envelope, discard all the redundant line 
segments and iterate on the reduced problem. In each successive iteration we 
square the size of the sample to be chosen. We iterate until either the size of 
the problem reduces to some threshold or the sample size becomes greater than 
some threshold. At this point, if the problem size reduces to less than n®, we 
use a brute force algorithm to solve the problem in constant time, else we use 
the algorithm described by Chen and Wada [7] to compute the upper envelope 
directly. 

To prove any interesting results we must determine how quickly the problem 
size decreases. The Random Sampling Lemmas discussed in the next section 
guarantee that when a sample of size is chosen, the problem size reduces 

fast. 

3.1 The Random Sampling Lemmas 

For any subset PCS consider the the two adjacent edges of the upper envelope 
of P. Draw two vertical lines, one through the left end point of the left edge and 
the other through the right end point of the right edge. We define a slab as the 
portion of the Euclidian plane between these two vertical lines. A configu- 
ration a (which we call region) is the region lying above the upper envelope in 
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such a slab. We say that the line segments adjacent to a define a. Notice that 
we include the line segment L (defined previously) in every subset P of S. From 
now onwards, whenever we talk of a subset P of S' we would actually be meaning 
PUL. 

We define a configuration space II{S) as the set of all such configurations 
defined by all the subsets of S. The set of line segments intersecting a region 
a is called the conflict list of a and is denoted by L{a) and its cardinality 
\L{a)\ denoted by l{a) is called the conflict size. Let L['{S) denote the set of 
configurations in II (S) with 1(a) = i. 

Let i? be a random subset of S. We define a subspace II (R) as the set of all 
such configurations cr G n{S) whose defining line segments are a subset of R. 
The conflict list in R for such configurations is L{a)nR. Define II* (R) to be the 
set of configurations in II (R) which are in the slabs formed by the pair of edges 
of the upper envelope of R scanned left to right. Clearly for a region cr G II* (R), 
the set of line segments of R, conflicting with a is empty, i.e. L{a) fl i? is empty. 
That is n*(R) C n°(R). 

Clearly our configuration space II (S) has bounded valence, since the number 
of regions defined by the same set of line segments is bounded (constant) . In fact 
this number is exactly equal to one. Thus the following random sampling results 
due to Clarkson and Shor hold for our problem. 

Lemma 3 ([9,16]). For some suitable constant k and large n, 

Pr ^ l{a) > kn < Ijc , 

cre7T*(fi) 

for some constant c > 1, where probability is taken over all possible choices of 
the random sample R. 

The above lemma gives a bound on the total size of the sub-problems. 
Lemma 4 ([9,16]). For some suitable constant ki and large n, 

Pr max l{a) > fci(n/r)logr <^jc , 
a^n* (R) 

for some constant c > 1, where probability is taken over all possible choices of 
random sample R such that |P| = r. 

This lemma gives a bound on the maximum size of each sub-problem. 

A sample is “good” if it satisfies the properties of Lemmas 3 and 4 simulta- 
neously. Infact we have the following. 

Lemma 5. We can find a sample R which satisfies both Lemmas 3 and 4 si- 
multaneously with high probability. Moreover, this can be done in 0(log r) time 
and 0(n log r) work with high probability. 

Proof. This can be done using Resampling and Polling [18]. □ 
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We call a region “critical” if it contains at least one output vertex. Let U*{R) 
denote the set of regions introduced by the random sample R and i7h(i?) denote 
the set of critical regions. Since |ilh(i?)| < H, a, good sample clearly satisfies the 
following property also. 

Lemma 6. For a good sample R, with |i?| = r, 

l{a) = 0{nHlogr/r) . 

crenh(R) 



This will be used repeatedly in the analysis to estimate the non-redundant 
line segments whenever H < r/logr. 

We say that a line segment is redundant if it does not intersect any critical 
region, i.e. either it lies entirely below the upper envelope of R or it intersects 
some region lying above the upper envelope but not the critical region. Consider 
a region that does not contain any output vertex. Clearly only one line segment 
is useful in this region, which is the line segment that has the maximum y- 
coordinate in that region. Such a line segment must intersect at least one of the 
regions containing an output point and is therefore retained. 

3.2 The Algorithm 

Let rii (respectively r,) denote the size of the problem (respectively sample size) 
at the iteration with rii = n. Repeat the following procedure until > rf 
(this condition guarantees that the sample size is never too big) or ni < rF for 
some fixed e between 0 and 1. If n, < n® then find out the upper envelope of rii 
line segments directly using a brute force algorithm, else do one more iteration 
and find out Ui+i. If ni+i = 0{rF) then we use a brute force algorithm, otherwise 
we use the algorithm of Chen and Wada [7] to find the upper envelope of ni+i line 
segments. The following is the description of the i**' iteration of the algorithm. 

Rand-UE (i) 

1. Choose a “good” sample R of size r, = constant for i = 1 and for i > 1. 
Find out the upper envelope of R. 

2. (a) For every line segment find out the regions that it intersects. 

(b) Discard the redundant line segments lying below the upper envelope of 
R. (Sect. 4 discusses this in detail). 

(c) If the sum taken over all the remaining line segments, of the regions 
intersecting a line segment is 0{n) then continue else go to Step 1. 

3. Filter out the segments not conflicting with any critical region as follows, 
(a) Compute ilh(-R) as follows. 

For every region a do the following: 
i. Find out the line segments intersecting a and assign as many pro- 
cessors to it (see Lemma 3 and Step 2(c) above). 
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ii. Consider the points of intersection of the line segments with the ver- 
tical lines defining the region (the boundaries of the slab lying above 
the edges of the upper envelope) including the intersection points 
by the edges of the upper envelope. If the points with maximum y- 
coordinate belong to the same line segment say s, and there does not 
exist any segment above s lying entirely in a, then a ^ iTh(i?) else 
a G IIi,{R). 

(b) Delete a line segment if it does not belong to Ucre 77 h(fl)^(f^)- 
4. The set of line segments for the next iteration is its size 

is ni+i = I Uo-gTihCfl) L{a)\. Increment i and go to I. 

The analysis of Gupta and Sen [13] goes through here also. However detecting 
and deleting the redundant segments require further explanation which is done 
in Sect. 4. Hence we have the following results. 

Lemma 7. The upper envelope of n possibly intersecting straight line segments 
can he constructed in 0(max{logiJ, loglogn}) time with high probability using 
a linear number of processors. 



Remark 2. For logiL = l7(loglogn), this attains the ideal O(logiL) running 
time with n processors, keeping the work optimal. 

Using the standard slow-down method the algorithm is made optimal for all 
values of H. That is, we use p = n/ p processors where p = log log n, instead of 
n processors. 

Theorem 2. The upper envelope ofn possibly intersecting straight line segments 
can he constructed in 0{log H -loglogn) expected time and OljilogH) operations 
with high probability in a CRCW PRAM model where H is the size of the upper 
envelope. 

Proof. Refer to Gupta and Sen [13]. □ 

4 Finding the Redundant Line Segments 

For the purpose of this section we call a segment redundant if it lies entirely below 
the upper envelope of the random sample. Otherwise, we call it non-redundant. 
Consider an upper envelope of the random sample of line segments, divided into 
regions (as defined in Sect. 3.1). If a line segment intersects the upper envelope 
at any point then it is non-redundant. However, if it does not intersect, then two 
cases arise: 1. Either it lies entirely above the upper envelope in which case, it is 
non-redundant or, 2. It lies entirely below the upper envelope in which case, it 
is redundant. So to determine if the line segment is redundant or not, we need 
to determine if it intersects the upper envelope. Let the vertices of the upper 
envelope spanned by a line segment be called the defining points for that line 
segment. If a line segment intersects the part of the envelope defined only by the 
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defining points, then the line segment intersects the upper envelope and hence, 
it is declared non-redundant. But if it does not intersect this part, then if it lies 
above the upper envelope of its defining vertices, then it is non redundant. Else, 
if it lies below the upper envelope of its defining vertices, we consider the two 
extreme slabs which the line segment intersects. If the line segment intersects 
the part of the envelope in these extreme slabs then it is non-redundant, else 
it is redundant. Thus determining whether a line segment is redundant or not 
reduces to determining whether it intersects the upper envelope formed by its 
defining vertices. 

Lemma 8. If a line segment intersects the lower convex chain of its defining 
vertices then it intersects the upper envelope defined by them. 

Proof. Ommitted. □ 

The task of determining whether a segment is redundant or not, reduces 
to finding whether it intersects the lower convex chain of its defining vertices. 
Checking whether it lies above or below, in case it doesn’t, can be done by 
checking it against one of the vertices of the lower convex chain in constant 
time. Clearly, a segment that does not intersect the lower convex chain and lies 
above it is non-redundant (whether it intersects the upper envelope or not). For 
the segment that does not intersect the lower convex chain and lies below it, 
we will examine the extreme slabs. Checking the extreme slabs can be done in 
constant time because we need to consider only two regions having a constant 
number of edges; check whether the line segment intersects any of these edges. 
The case when a line segment belongs entirely to one region is a special one in 
which the two extreme regions coincide. Hence, this again can be handled in 
constant time. 

4.1 The Locus Based Approach 

We use a locus-based approach to find whether the line segment is redundant. 
We define an equivalence relation where two segments are equivalent if and only 
if they have the same set of defining points. The number of such equivalence 
classes is 0{mf). For each of these equivalence classes we take a representative 
line segment and precompute the lower convex chain of its defining vertices. 
Using a brute force algorithm we can compute the lower convex chain of the 
vertices associated with all the equivalence classes in constant time using 0(m®) 
processors. Hence we have the following lemma. 

Lemma 9. Given an upper envelope of size m, we can preprocess its vertices 
in constant time using 0(m®) processors, such that given any arbitrary line seg- 
ment with k processors, we can determine whether it is redundant or not in 
Oilogm/ \ogk) time. 

Proof. Since convex chains have been precomputed, the result follows from the 
fact that, given a convex chain C of size c and a line segment ‘T, whether the 
line segment intersects C can be determined in 0(log c) time or in 0(log c/ log k) 
time by a fc-way search using k processors. □ 
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5 The 0(logn/logfc) Time Algorithm 

The general technique given by Sen [21] to develop sub-logarithmic algorithms 
can be used to design an Oilogn/ log k) algorithm for the problem. Let the num- 
ber of line segments be n and the number of processors be (nfc), k > log^*'^^ n. 

1. Pick up a “good” sample R of size r = 

2. Compute the upper envelope of the sample using a brute force algorithm. 

3. Remove all the redundant line segments (those line segments that lie com- 
pletely below the upper envelope of the sample computed in Step 2). 

4. For every remaining line segment find out the number of regions that it 
intersects. If the sum taken over all the line segments is 0(n) then continue 
else goto Step 1. 

5. For every region cr do the following 

(a) Find out the line segments intersecting the region and assign k times as 
many processors to it. (see Lemma 3, 4 and Step 4). 

(b) If the size of the sub problem is > log^^^^ n, then recurse, else 

(c) Solve the sub-problem in constant time. With k > log^^^^ n this can be 
done in constant time. 

6. Merge the upper envelopes formed in all the sub problems. 

Theorem 3. Given n straight line segments and nk (k > log^^^^ n) processors, 
we can find their upper envelope in expected 0(logn/logfc) time with high prob- 
ability. 

Proof. Refer to Sen [21]. □ 



6 Remarks and Open Problems 

One of the issues that remains to be dealt with is that of speeding up the 
algorithm further using a superlinear number of processors such that the time 
taken is C(log iJ/ log k) with nk processors, where k > 1. Designing an algorithm 
that takes 0(log H) time and does optimal work for all values of H is another 
open problem. 

The other issue is to design a sub-logarithmic time algorithm using super- 
linear number of processors for fc > 1. The technique of Sen [18,21] to filter the 
redundant line segments to control the blowup in the problem size and the pro- 
cessor allocation scheme used by them is not particularly effective here. In that, 
we allocate one processor for every output vertex contributed by the segment. 
As the number of output vertices contributed by a segment can’t be predicted, 
we don’t know how many processors to allocate to a given line segment and the 
requirement of a segment as it goes down the levels of recursion may increase. 
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Abstract. We consider the problem of list decoding from erasures. We 
establish lower and upper bounds on the rate of a (linear) code that can 
be list decoded with list size L when up to a fraction p of its symbols are 
adversarially erased. Our results show that in the limit of large L, the 
rate of such a code approaches the capacity (1 — p) of the erasure chan- 
nel. Such nicely list decodable codes are then used as inner codes in a 
suitable concatenation scheme to give a uniformly constructive family of 
asymptotically good binary linear codes of rate lg(l/e)) that can 

be efficiently list decoded using lists of size 0{l/e) from up to a fraction 
(1 — e) of erasures. This improves previous results from [14] in this vein, 
which achieved a rate of lg(l/e)). 

Keywords: Error-correcting codes. Linear codes, Decoding algorithms. 
List decoding. Erasure channel. 



1 Introduction 

The list decoding problem for an error-correcting code consists of outputting a 
list of all codewords that lie within a certain Hamming distance of the received 
word [6,19]. The decoding is considered successful as long as the correct codeword 
is included in the list. If the decoder is restricted to output lists of size L, the 
decoding is called list-of-L decoding. List decoding with even moderate-sized lists 
allows one to correct significantly more errors than possible with unambiguous 
decoding (cf. [7,13,11]). 

In this paper, we consider the problem of list decoding from erasures, i.e. 
list decoding when a certain fraction of the symbols of the received word are 
adversarially erased. Specifically, we are interested in codes with non-trivial list 
decoding performance - i.e. codes of “large” rate that are list decodable using 
“small” lists from a “large” number of erasures. Such a study has already been 
carried out with some success for the case of errors [7,20,4,11]. Though the case 
of erasures is algorithmically always easier to handle, it raises several interesting 
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combinatorial questions which this paper addresses. Also, the simpler situation 
of erasures makes better tradeoffs (between rate and list decodability) possible 
than for the errors case. For example, for binary codes the maximum fraction 
of errors one can hope to correct from is (1/2 — e), where as decoding up to a 
fraction (1 — e) of erasures is possible. The rate (as a function of e) that one can 
achieve for the erasures case also turns out be better than what is possible for 
the errors case. 



1.1 Definitions and Notation 

We focus on linear codes throughout this paper. For simplicity we also restrict our 
attention to binary codes though all our results go through for larger alphabets 
as well. Recall that a binary linear code C of blocklength n, dimension k and 
minimum distance d, denoted an [n,k,d ]2 code, is a fc-dimensional subspace 
of F 2 such that the minimum Hamming weight of a non-zero element (called 
codeword) of C equals d. When we omit the distance, we refer to the code as an 
[n, k ]2 code. The rate of the code is defined to be the ratio k/n and the relative 
distance is defined to be d/n. The main thrust of this paper is the asymptotic 
performance of codes, and we therefore focus on infinite families C = {Ci}/>i 
of [ui, ki, di ]2 codes of increasing blocklength m — >■ 00 . The rate R{C) of such a 
family is defined to be R{C) '= liminfi and its relative distance is defined 

to be (5(C) liminfi 

We next define the list decoding radius for recovering from erasures. For 
y G {0, 1}” and T C {1, 2, . . . , n}, define [yjr G {0, 1}I^I to be the projection of 
y onto the coordinates in T. A code C of blocklength n is said to be (s, L)-erasure 
list-decodahle if for every r G {0, 1}”“'* and every set T C {1,2, ... ,n} of size 
(n—s), we have ||c G C : [c]t = r}| < L. In other words, given any received word 
with at most s erasures, the number of codewords consistent with the received 
word is at most L. Note that a code of minimum distance d is {d— 1, l)-erasure 
list-decodable, but is not {d, l)-erasure list-decodable. 

For linear codes, list decoding from erasures is algorithmically easy as it 
just amounts to finding the space of solutions to a linear system. Thus, if a 
linear code is (s, T)-erasure list-decodable for some small L, then a list of size 
at most L can also be efficiently output given any received word with at most s 
erasures. (This is not true for the case when there are errors, where algorithmic 
list decodability is potentially a lot more difficult to achieve than combinatorial 
list decodability - in fact there are known constructions of codes with good 
combinatorial list decodability, but for which efficient list decoding algorithms 
are not known.) Since our focus in this paper is on recovery from erasures, we 
only focus on linear codes with good combinatorial erasure list decodability, and 
this automatically implies a polynomial time list decoding algorithm from a large 
number of erasures. But for our main explicit code construction, we do mention 
how the decoding time can be improved beyond the time required to solve a 
linear system. 
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Definition 1 (Erasure List Decoding Radius) For an integer L > 1 and a 

code C , the list-of-L erasure decoding radius ofC, denoted radiuSi(C'), is defined 
to be the maximum value of s for which C is {s, L)- erasure list-decodable. 



Definition 2 (ELDR for Code Families) For an infinite family C = {Ci}i>i 
of codes and an integer L > 1, the list-of-L erasure decoding radius ofC, denoted 
ELDRi(C), is defined to be 



ELDRnC)fedimmf(5‘!'f5i<Li\ , (1) 

i I m j 

where Ui is the blocklength of Ci . 

We now define the function which measures the trade-off achievable between 
rate and erasure list decoding radius. 

Definition 3 For an integer L and 0 < p < 1, the maximum rate of a code 
family with list-of-L erasure decoding radius at least p, denoted Rl{p), is defined 
as 

Rl(p) =' sup R{C) . (2) 

C: ELDRi,(C)>p 



1.2 Main Results 

Combinatorial Bounds. One of the objectives of this paper is to establish 
lower and upper bounds on this function Rl{p) (Theorem 1 and Theorem 
2). Since the list-of-1 ELDR of a code family equals its relative distance 5, 
Ri{p) = R{S) is the central function in coding theory that studies the trade-off 
between the rate and minimum distance of a code. One of the consequences of 
results of this paper is that in the limit of large L, the function Rl{p) tends to 
1 —p, thus matching the singleton bound. This result has the following nice inter- 
pretation. It is a classical result in coding theory that the capacity of the erasure 
channel where each codeword symbol is randomly and independently erased with 
probability p, equals (1 — p) [5]. Thus our results show that using list decoding 
with large enough (but constant-sized) lists, one can approach capacity even if 
the symbols that are erased are adversarially (and not randomly) chosen. Our 
upper bound on Rl{p) also shows that Rl{p) < 1 — p for every p and every fixed 
list size L (this result holds even for general, non-linear codes). Thus one needs 
unbounded list size in order to approach the capacity of the adversarial erasure 
channel using list decoding. We remark that analogous statements also hold for 
the errors case from earlier results in [7,20,4,11]. 



Code Constructions. For any constant £ > 0, our lower bound on Rl{p) im- 
plies the existence of a binary linear code family Cg = {Ci}j>i of rate f^{s/ lg(l/£)) 
such that each Ci is — 0(l/CT))-erasure list-decodable for every cr satisfy- 
ing s < (7 < 1. One can use such codes as inner codes in a concatenated scheme 
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with outer code being any code of rate l?(e) and relative distance 1 — 0(e) 
(like Reed-Solomon codes, or, if one wants codes over constant alphabet size, 
algebraic-geometric codes or the expander based codes of [2]). Such a concate- 
nation scheme gives a uniformly constructive family of binary linear codes of rate 
I7(e^/lg(l/e)) that can be list decoded from a fraction (1 — e) of erasures using 
lists of size 0(l/e)d The previous best result in this vein from [14] achieved 
a rate of 12(£^), by exploiting the fact that any family of codes with relative 
distance greater than (1 — e)/2 has a small list size for decoding from (1 — e) 
erasures. ^ The best construction of such high distance codes achieves a rate 
of (see [2]). We stress that the novelty of our result is that it is not ob- 

tained by appealing to this distance to ELDR relation, and indeed no polynomial 
time constructions of binary code families of relative distance (1/2 — 0(e)) and 
rate about are known. In fact such a construction (which will asymptotically 
match the Gilbert-Varshamov bound at low rates) will be a major breakthrough 
in coding theory. 

Moreover, by using Reed-Solomon codes as outer codes, we can get a code 
with slightly worse construction time, but which can be encoded in near-linear 
time and decoded in near-quadratic time. The decoding is based on a list decod- 
ing algorithm due to [13] and its fast implementation due to [16,17]. 

2 Combinatorial Bounds on Rl(p) 

2.1 Lower Bound on Rl{p) 

All logarithms in this paper are to the base 2 and will be denoted by Igx. The 
following folklore combinatorial lemma gives a useful linear-algebraic character- 
ization of when a linear code is (s, L)-erasure list-decodable. 

Lemma 1. An [n, fc ]2 linear code C is {s, L)- erasure list-decodable if and only 
if its nx k generator matrix G has the property that every {n — s) x k submatrix 
of G has rank at least {k — [IgLj). 

Proof: Let T C {1, 2, . . . , n} with \T\ = n — s and r G {0, 1}"“^, the number of 
codewords c G C with [c]t = r is precisely the number of solutions x G {0, 1}* 
to the system Gtx = r where Gt is the submatrix of G consisting of all rows 
indexed by elements in T. By standard linear algebra, the number of solutions 
X to the linear system Gtx = 0 is precisely 2^ where i = k — rank(GT), and for 
any r G {0, 1}"“^, the number of solutions x to Gtx = r is at most 2^ (in fact, 
it is always either 0 or 2^). Hence G is (s, L)-erasure list-decodable if and only 
if for every T C {1, . . . , n} with |T| = n — s, Gr has rank at least k — [Ig LJ . □ 

^ The term “uniformly constructive” means a polynomial time construction where the 
exponent of the polynomial is an absolute constant independent of e. 

^ More generally, any binary code of relative distance <5 has a small (linear in block- 
length) number of codewords for up to a fraction 2S of erasures, and this is the best 
possible in the sense that for every 7 > 0, there exist linear codes C-y of relative 
distance 5, and a received word y with a fraction (2 -|- y)5 of erasures, such that C-y 
has exponentially many codewords consistent with y. 
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Theorem 1. For every integer L > 1 and every p, 0 < p < 1, we have 






H{P) 

rig(i+i)i ■ 



(3) 



The above theorem follows from Part (i) of the stronger claim of the lemma 
below. The second part of the lemma will be used in the application to con- 
catenated codes in Section 3. It asserts the existence of codes that exhibit a 
“gradual” increase in list size as more and more codeword symbols are erased, 
up to a total of (1 — e)n erasures. 



Lemma 2. For all large enough n, the following hold: 

(i) For every integer L > 1 and every p, 0 < p < 1, there exists an [n, fc ]2 
linear code C with k = [(l — p — ~ v^J L)-erasure 

list-decodable. 

(a) There exists absolute constants A,B>Q such that for every £, 0 < £ < 1, 
there exists an [n, k ]2 linear code C with k = L ig(^/g) J that is (s, erasure 
list-decodable for every s < (1 — e)n. 

Proof: We first prove Part (i) of the lemma. The proof is based on the proba- 
bilistic method. We will pick a code C generated by a random n x k generator 
matrix G where k is as specified in the statement of the lemma. ^ We will prove 
that such a random code is (pn, L)-erasure list-decodable with high probability. 
By Lemma 1, this amounts to estimating the probability that a random n x k 
matrix over F 2 has the property that every t x k submatrix has rank at least 
{k — [IgLj), where t = (1 — p)n. For any J', 0 < J' < k, the probability that a 
random matrix M of order txk has rank (fc — J') is at most 2“*'^ . This fol- 
lows since for a fixed subspace S of F| of dimension {k— J'), the probability that 
all t rows of M lie in S is at most 2“*^ and furthermore the number of subspaces 
of F 2 of dimension (fc — J') is at most 2*'^ as one can specify such a subspace 
as the null-space of a, J' x k matrix over F 2 . Let J +1)1 = 1 + Ug+J • 

Then, for t > k, the probability that a random matrix M of order txk has rank 
{k — J) or less is at most 






— t)J 






Now, by a union bound, the probability that some txk submatrix of G has rank 
at most (A: — J) is at most 



® We will assume for simplicity that C has dimension k, i.e. G has full column rank, 
since this happens with very high probability. 
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= k-2~-^'^ 

= o(l) , 

where in the second step we have used the fact that = (p*^) < 2^^^’^” 

(cf. [18, Chap. 1, Theorem 1.4.5]), and in the third step we substituted the value 
of k from the statement of Part (i) of the lemma. 

Hence an n x fc matrix G every t x k submatrix of which has rank at least 
fc — J + 1 = fc — [Ig LJ exists, and by Lemma 1, therefore, there exists a {pn, L)- 
erasure list-decodable code of blocklength n and dimension k. This proves Part 

(i) of the lemma. 

To prove Part (ii), we apply the above proof for p = (1 — e), and k = 
L ig(8/e) ^J’ estimate the probability that for a fixed s, where n/2 < s < 

(1 — s)n, C is not (s, ^^)-erasure list-decodable. By Lemma 1, this happens 
only if some {n — s) x k submatrix of G has rank less than k — lg(-^). Let 
a = {n — s)/n; we have e < a < 1/2. As in the proof of Part (i), the probability 
that some {n — s) x n submatrix of G has rank less than k — lg(8/CT) is at most 

” ^ ■ k ■ *s(8/cr) ^ ^ ^ . 2 ( 1 ®/ lg(8/<^)« 

n — sj ~ \aJ 

_ 2-©(”) 

where the last step follows since a > e, and in the first step we used the inequality 
(;;) < (e/a)-" for a < 1/2. 

Now, by a union bound, the probability that for some s, n/2 < s < (1 — e)n, 
G is not (s, ^^)-erasure list-decodable is also exponentially small. Hence there 
exists a linear code C of blocklength n and rate £/lg(8/e) that is (s, ^^)-erasure 
list-decodable for every s that satisfies n/2 < s < (1 — e)n. Since the list size 
for s < n/2 erasures is at most the list size for n/2 erasures, such a code is also 
(s, -l^)-erasure list-decodable for every s, 0 < s < (1 — e)n. This proves Part 

(ii) of the claim (with the choice A = 8 and H = 16 for the absolute constants). 

□ 

Remark: Note that the above lemma not only guarantees the existence of codes 
with the required properties, but also proves that a random code has these 
properties with very high probability. 

2.2 Upper Bound on Rl{p) 

We now turn to upper bounds on Rl{p). It is easy to prove that for any fixed 
L (in fact even for a list size L that is allowed to grow polynomially in the 
blocklength), we must have Rl{p) < 1— P- Indeed, let C be a code of blocklength 
n and rate r, and let T = {1, 2, . . . , (1 — p)n}. Pick a random y G {0, 1}(i~p)" 
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and consider the set Sy of codewords c€ C that satisfy [c]t = V- The expected 
number of such codewords equals and hence there must exist a 

y G {0, 1}” for which jS'yl > Since we want \Sy\ < L < poly(n), we 

must have r < (1 — p). Hence Rl{p) < 1 Below, we are interested in a better 
upper bound on Rl(p), which in particular bounds it strictly away from (1 — p) 
for every fixed L (and thereby shows that one requires unbounded list size in 
order to approach the “capacity” (1 — p) of the erasure channel). The proof uses 
standard techniques similar to those used in the Plotkin and Elias-Bassalygo 
bounds for the rate vs. minimum distance trade-off. It will appear in the full 
version of the paper and can also be found in [10, Chapter 10]. 

Theorem 2. For every integer L > 1 and every p, 0 < p < 1 — 2“^, we have 

Rl{p) < minjl - H{\), 1 - ^ | , (4) 

where A is the unique root of the equation + (1 ~ = 1 — p in the range 

0 < A < 1/2. For p > 1 — 2~^, we have Rl{p) = 0. 



Corollary 3 For every integer L and every p, 0 < p < 1, Rl{p) < 1 — p. 

Remark: For L = 1, the two bounds proven in Theorem 2 are precisely the 
Elias-Bassalygo and Plotkin bounds on the rate of a code in terms of its min- 
imum distance, which state that R{5) < 1 — and R{5) < 1 — 25, 

respectively. For large L, the bound Rl{p) < 1 — p/(l — 2“^) is better than the 
other bound Rl{p) < ^ ~ H{p) except for very small p (less than about L/2^). 

3 Application to Concatenated Codes 

We now use the codes guaranteed by Lemma 2 (specifically Part (ii) of the 
lemma) as inner codes in a suitable concatenation scheme to construct binary 
code families of rate C(e^/ lg(l/e)) that can be efficiently list decoded using 
lists of size 0(l/s) when up to a fraction (1 — e) of its symbols are erased. This 
follows from the following theorem (Theorem 4), which is the main result of this 
section. The proof uses the simple, yet powerful, tool of code concatenation [8]. 
Given an [N, K, D] 2 ^ code Ci, and an [n, m, d ]2 code C 2 , the concatenated code 
Cl ©C 2 is an [Nn, Km, > Dd ]2 binary linear code whose codewords are obtained 
by encoding a message x first using Ci, and then encoding each of the symbols 
of Ci(x) using C 2 (note that each symbol of Ci(a;) is an element of GF(2™) and 
can hence be interpreted as an m-bit vector, and m is exactly the dimension of 
C 2 ). 

Theorem 4. For every e > 0, there exists a family of binary linear codes of rate 
I7(e^/ log(l/e)) such that a code of blocklength in the family can be constructed 
in 2® ° -|-poly(fV, 1/e) time, and can be list decoded in polynomial time using 
lists of size 0(l/e) when up to a fraction (1 — e) of its symbols are erased. 
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The above result follows immediately from the following lemma. 

Lemma 3. There exist absolute constants b, d such that for all large enough 
integers K and all small enough e > 0, there exists a binary linear code Ck that 
has the following properties: 

(i) Ck has dimension K and blocklength N < ^ 

(a) The generator matrix of Ck can be constructed in + poly(iV, 1/e) 

time. 

(Hi) Ck is — ^)-erasure list-decodable (and since Ck is linear there is an 
0{N^) time list decoding algorithm to decode up to (1 — e)N erasures using 
lists of size 0{l/e)). 



Proof: The code Ck is constructed by concatenating an outer code Cout over 
GF(2™) of blocklength no, dimension ko = K/m and minimum distance do, with 
an inner code Ci„ as guaranteed by Part (ii) of Lemma 2. By using a construction 
in [2], we can pick parameters so that ko = K/m = l7(eno), m = 0(l/e) and 
do = (1 — O(e))no (for convenience, we hide constants using big-Oh notation, but 
we stress that these are absolute constants that do not depend on e).^ We note 
that this choice of Cout is not linear over GF(2’”) (though it is additive), but a 
(succinct) description of Cout can be constructed in time poly(n, 1/e). Moreover, 
after concatenation with an inner binary linear code, the overall concatenated 
code will be a binary linear code. 

The inner code Cin will be a code as guaranteed by Lemma 2 of dimension m 
and blocklength m = 0 (!lM 1 /£)^ jg ((l — cr)ni, i?/(T)-erasure list-decodable 
for every cr > e/2. We can construct such a code by a brute-force search in 
20 (mni) _ 2 ^ ° time. Once we have descriptions of Cout and Cm, one can 
construct the generator matrix of the concatenated code Ck in poly(iV, 1/e) 
time where N = uqUi is the blocklength of Ck- Now N = uq ■ ni = O(^) • 
Q( ”7.ig(i/£) ) _ Q( ^'s(/C) )^ This proves Parts (i) and (ii) of the statement of the 
lemma. 

We now prove that Ck is {{l — s)N, 0(l/e))-erasure list-decodable. Let y be 
a received word with (1 — e)fV erasures, and let c'^, C 2 , . . . , c'^ be the codewords 
in Ck “consistent” with y, and let Cj be the codeword of Cout corresponding to 
c'j, for 1 < j < M. We wish to prove that M = 0{l/e). Let yi be the portion 
of y corresponding to the i’th outer codeword position, for 1 < i < np. Let the 
number of symbols in yi be UiUi. We have Define Q = {i : Oi> 

e/2}. Glearly we have 



E eno 
a.> — . 



ieQ 



( 5 ) 



^ Actually, we can use certain families of algebraic-geometric codes and even have 
m = 0(log(l/e)). But we prefer to use the codes from [2] as they are easier to 
construct and have lower construction complexity, and we do not want to give the 
impression that we need complicated AG-codes for our construction. We will high- 
light the improvement in performance attainable using AG-codes in Section 3.2. 
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Now define “weights” Wi ^/3 for 1 < i < no and (3 G GF(2™) as follows. If i ^ 
Q, set = 0 for all l3 G GF(2™). If i G Q, then ai > e/2 and hence by 
construction Cin is ((1 — ai)ni, B /ai)-eras\iie list-decodable. In other words, if 
Ji = {P : CiniP) is consistent with }, then | J/ < Bjai. We set (for i G Q)'. 

_ ( (Ti if P G Ji 
( 0 if Pi J^ 

Since \Ji\ < Bjai, our choice of weights clearly satisfy 

i,/3 i^Q 

We now use a combinatorial result that gives an upper bound on the number 
of codewords of Gout that satisfy a certain weighted condition depending on the 
distance do of Gout- This result is a “weighted” version of the classical “Johnson 
bounds”, and appears, for example, in [15] and [10, Section 3.4]. Specifically, this 
result states that, for any e' > 0, the number of codewords (ai, a 2 , ■ ■ ■ , cXng) G 
Gout that satisfy 



'B/q X / 2 

X > ((»^o - 4(1 - £')) X 

i—1 i,(3 



is at most l/e'. 

Now for each Cj, 1 < j < M, and each i G Q, we have Cj^i G Ji, where 
Cj^i G GF(2'") denotes the i’th symbol of cj. Now, = ctj for every i G Q 

and = 0 for i i Q. Thus, we have 



np 

X = X 

i—1 i£Q 

Gombining Equations (5), (6) and (8), we have that the codeword cj satisfies 
Gondition (7) as long as 



> (no - 4(1 - e'))B , 

which can be satisfied provided 4 > no(l — 2 lj)(l ~ Picking s' = -fg, 

we only need 4 = no(l — 0(e)) which is satisfied for our choice of Gout- Hence 
the number of codewords consistent with the received word y is at most l/e' = 
0(l/£), thus proving Part (iii) of the lemma as well. □ (Lemma 3) 

Remark: The previous results from [14] (obtained using code constructions 
from [2,3]) along the lines of the above theorem achieved a blocklength of N = 
min{0(=^), 0(1|)}. Thus, our result achieves the best features of both these 
bounds and gets N = 0{K/e^) (hiding the lg(l/£) factor). 
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3.1 Obtaining Near-Linear Encoding 
and Quadratic Decoding Times 

The codes constructed in Lemma 3 being linear can be encoded in O(N^) time 
and can be list decoded in 0{N^) time by solving a linear system. By using Reed- 
Solomon codes as outer code, the encoding can be accomplished in 0{N) time 
(here 0{N) denotes 0(Npoly(lg TV))). This is because the outer encoding can be 
performed in 0{N) using FFT based methods and then the inner encoding just 
takes 0(lg^ N) time for each of the N outer codeword symbols. For decoding, 
the inner codes can be decoded in 0(lg^ N) time by solving a linear system, and 
then the Reed-Solomon code can be decoded from a set of weights that satisfy 
Condition (7) using a near-quadratic time implementation (cf. [16,17]) of the 
weighted Reed-Solomon decoding algorithm from [13]. However, the code family 
is no longer uniformly constructive as one must now search and find a good inner 
code of dimension l7(lgiV) as opposed to the construction in Lemma 3 where 
the inner code has dimension which is a constant dimension depending only on 
e. In fact, a naive search for the inner code among all inner codes will take time 
which is quasi-polynomial in N. However, one can use the method of conditional 
expectations to achieve a deterministic construction in time — we 

omit the details. 



3.2 Improving Construction Time 

Using Algebraic- Geometric Codes 

Despite the impressive encoding and decoding times, the drawback of the above 
construction with Reed-Solomon codes as the outer code is the rather high ex- 
ponent of N in the construction time. In particular, the code family ceases to 
be uniformly constructive, since the construction time is no longer of the form 
0{f{e)N°-) where a is an absolute constant independent of e. 

Instead of Reed-Solomon codes, we can use AG-codes of distance (1 — 0{e)) 
as the outer code. Such codes of rate 17(e) exist over an alphabet size of 0{l/e^) 
(any family of AG-codes that meet the so-called Drinfeld-Vladut bound will 
satisfy this, cf. [10, Section 6.3.9]). Now, the dimension k of the inner binary 
code is only 0(log(l/e)), and one can deterministically find a binary linear inner 
code which has the required erasure list decodability properties in (i/e)) 

time. 

Since the overall code is linear, the claims of quadratic encoding time and 
cubic decoding time still holds. Thus, we can obtain the same properties as 
the codes from Lemma 3 with an improved construction time of (i/^)) _|_ 

poly (A, 1/s). We formally record this improvement below. 

Theorem 5. For every s > 0, there exists a family of binary linear codes of rate 
I7(e^/ log(l/e)) such that a code of hlocklength in the family can he constructed 
in 2^ ^ ' -|-poly(A, 1/e) time, and can be list decoded in polynomial time using 

lists of size 0{l/e) when up to a fraction (1 — e) of its symbols are erased. 
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4 Concluding Remarks 

Our lower bound on Rl{p) from Theorem 1 guarantees the existence of binary 
linear code families of rate f^{e) which can be list decoded from up to a fraction 
(1 — e) of erasures, using lists of size, say, e“^. The construction of Theorem 4, 
however, only achieves a rate of about e^. It was recently observed in [12] that the 
result of Theorem 1 is actually “tight” in the “high-noise regime”- specifically 
they show that for linear codes, in order to have positive rate, one requires a list 
size of at least l7(l/£) to list decode from (1 — £) erasures. This indicates that 
our results cannot be improved by the choice of a better linear code as inner 
code. 

Furthermore, as pointed out to us by Alon [1] (see also the formal discussion 
in [10, Section 10.6.4]), getting a poly(A^, 1/e) time construction of a binary 
linear code family of rate better than e^ (like e^/^, for example) with poly(l/e)- 
sized lists for up to a fraction (1 — e) of erasures, would imply a significant 
improvement for explicit constructions of certain bipartite Ramsey graphs. In 
this sense, improving our results further seems to be a significant challenge: 
there is, however, some hope since we allow for a construction that runs in time 
exponential in 1/e and/or a list size that is exponential in 1/e. It is an interesting 
open question whether this can be exploited to beat the “e^ barrier” for the rate. 

If one allows larger alphabets, known constructions of large distance algebraic- 
geometric codes imply code families of ELDR (1 — e) and rate 12(e) over an 
alphabet of size 0(l/e^), even for a list size of 1. Explicit constructions of linear 
codes of rate better than 12(e^) with ELDR (1 — e) even for a slightly smaller 
alphabet do not appear to be known. However, in recent work, [12] present con- 
structions of non-linear codes over large alphabets (of size independent of e) that 
can be efficiently list decoded from a fraction (1 — e) of erasures and which have 
rate better than 12(e^) (in fact they have rate approaching 12(e) as the alphabet 
size grows) — details on this construction appear also in [10, Section 10.7]. 

It is also an interesting question whether a code construction similar to The- 
orem 5 can be obtained in time which is polynomial in both N and 1/e. 
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Abstract. The Timed Asynchronous System (TAS) modeI[3] has less 
stringent assumptions than the synchronous model but is still strong 
enough to serve as a foundation for the construction of dependable appli- 
cations. In this paper, we verify the correctness of some basic distributed 
services in TAS. First, TAS is modelled and then some important prop- 
erties of two basic services, FADS (Fail Aware Datagram Service) and 
HALL (Highly Available Local Leader Election Service), are formally 
verified. The PVS theorem prover is used for modelling and verification 
of the algorithms. 

During the process of verification, some of the assumptions in the model 
that were not explicitly noted in the literature came to light. In addition, 
due to the insight gained in the process of verification, the ability to 
extend the validity of some of the properties in the face of additional 
failures in the system became clear through appropriate modifications of 
these assumptions. 



1 Introduction 

Distributed systems can be broadly characterized as synchronous, partially syn- 
chronous or asynchronous [2]. Synchronous systems involve strict timing con- 
straints and bounded failure frequency assumptions while asynchronous systems, 
being a time-free model, do not involve any timing bounds. Asynchronous sys- 
tems allow processors to crash and messages to be dropped or delivered late 
(performance failures). As strictly asynchronous systems lack control over the 
scheduling of events in the system, essential services like consensus, election or 
membership have been shown to be not implementable. A partially synchronous 
model is timing based with some restrictions on the relative timing of events; 
for example, it may assume almost-synchronized clocks, approximate bounds on 
process step time or message delivery time. 

The Timed Asynchronous System (TAS) model[3], a timed model, has more 
complex assumptions than the synchronous model but which are easier to achieve 
in practical systems. They are, however, still strong enough to serve as a founda- 
tion for the construction of dependable applications. For example, the hardware 
clock drift rate is assumed to be bounded but the clocks of different processors 
are not assumed to be synchronized. In TAS: 
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1. All services are timed: A specification of a service prescribes a time bound 
within which state transitions are expected to occur. 

2. Interprocess communication is unreliable: Messages can suffer omission fail- 
ure (message dropped) and/or performance failure (message delivered late). 

3. Processes have crash/performance failure semantics. 

4. Processes have access to hardware clocks with bounded drift. 

5. Frequency of communication and process failures is unbounded. 

This timed model consists of finite set of processes that communicate via an 
unreliable datagram service. Processes have access to local hardware clocks. The 
datagram service provides for the send/receive primitives. A one-way time-out 
delay S is chosen such that messages are likely to be delivered within <5 real 
time units. Process management service ensures that a process has a maximum 
scheduling delay of a real time units. Some other properties of timed model 
include communication by time and conditional timeliness. Conditional timeli- 
ness requires guaranteed progress within a bounded amount of time only when a 
set of processes form a stable partition, as against conventional timeliness that 
requires progress to be guaranteed within a bounded time unconditionally. 

To guarantee liveness, progress or in real-time applications, it is often neces- 
sary to tag a message as fast or slow on delivery a posteriori. For example, in a 
leader election, extremely delayed messages may need to be ignored for progress^ . 
They have to be ignored for correctness if they are from earlier rounds. Also, in 
security protocols, to avoid replay attacks. 

The Fail Aware Datagram Service (FADS) [4] calculates an upper bound on 
the transmission delay of a message (Z\) and classifies all messages with trans- 
mission delay greater than A as slow and others as fast. A critical property that 
has to be shown for FADS is that a message that is fast may be delivered as slow 
but a slow message must not be delivered as fast. Various other services can be 
built upon fail awareness as shown in Figure 1. 

Several protocols in TAS, e.g. HALL (highly available leader election service) [7], 
use a leasing or locking mechanism to communicate by passage of time. For ex- 
ample, during a “lease”, there can be a promise not to change a value (value 
“locked” ) . The lease expires with or without communication; we only need some 
loose clock synchronization and some estimate on the delays to be able to make 
progress. We also require only one fail-aware datagram message instead of a 
round-trip message pair. Communication takes place between processes, even 
when the network is overloaded or partitioned. This mechanism is particularly 
useful for processes that switch a system to a safe mode. 

Initial measurements on actual systems [3] give evidence that the timed model 
adequately describes current distributed systems such as a network of work- 
stations. Also essential services, like leader election, are implement able [7]. Fail 
awareness can be applied to partitionable systems where communication is not 
certain due to network failures or excessive performance failures. Servers in each 

^ Note the lack of “progress” in the US presidential elections in Nov 2000 because of 
delayed (“slow”) votes! 
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Fig. 1. Hierarchy of fail-aware services (from [3]) 



logical partition are allowed to make progress independent of servers in other 
partitions. 

In this paper, we verify the correctness of some basic distributed services in 
TAS. First, TAS is modelled and then some important properties of two basic 
services, FADS (Fail Aware Datagram Service) and HALL (Highly Available 
Local Leader Election Service), are formally verified. The PVS theorem prover[6] 
is used for modelling and verification of the algorithms. 

Section 2 briefly discusses the Timed Asynchronous System, its properties 
and advantages. Section 3 discusses two services implemented over TAS and 
Section 4 discusses the modeling and verification aspects involved using PVS. 

2 Two Distributed Services in TAS 

2.1 Fail Aware Datagram Service 

In many practical situations in distributed systems, there is a need to reject old 
or stale messages. For instance, in leader election, a process needs to reject any 
message not belonging to the current round. Also to use the leasing mechanism of 
communication by time, a process has to be able to determine an upper bound 
on the one-way transmission delay of messages. FADS (Fail Aware Datagram 
Service) [1][4] calculates an upper bound for each message it delivers, and delivers 
the message as either fast or slow. FADS can be used in partitionable systems to 
provide “clean partitions”, i.e. even when slow messages from other partitions 
arrive, higher level protocols see non-overlapping “logical partitions” [7] . 

A process computes for each message m it receives an upper bound on the 
transmission delay of m by measuring on its local clock the duration of an 
asynchronous datagram round trip which includes m. If the calculated upper 
bound is less than A, it is delivered as fast else as slow, where A is chosen 
as given below. Since timestamps of earlier messages are required to calculate 
the upper bound, it is required that only after two processes are connected 
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for some bounded time 7, all timely messages between them are delivered as 
fast. (A timely message has a transmission delay smaller than 5). The upper 
bound is defined as ubn(m) = {D — A) * {1 + p) — Smin ~ {C ~ B) * {1 — p) 
where Smin is the minimum transmission delay of a message (can be set to 0). 
fa-deliverp{m, q, flag) denotes that message m from q is delivered to p at time 
t. The flag denotes whether the message is fast or slow. Two processes are said 
to be connected iff they are not crashed and all messages sent between them 
in the given time interval are timely. The FADS has the following properties in 
addition to the datagram service provided by the TAS model [3]: 



a«min 2(C-B)(1-P) 

< ► ► time 




<(D-A)(l+p) j time 



Fig. 2. Transmission delay td(m) = d-c is bounded by calculating an upper bound for 
the length of [a,d] and lower bounds for [a,b] and [b,c] (from [4]) 



1. Fail Awareness: Vp, q,m,t : fa_deliverp{m, q, fast) => 3s < t — A : sendq{m,p) V 
broadcast q{m) 

2. Timeliness:Vp, g, m, t : connected{p, q,t — "f,t + A) A{sendq(m,p)\/broadcastq{m)) 
=> 3s < t + A : fa_deliverp{m, q, fast). 

A is chosen so that the calculated upper bound for any timely message < A 
i.e. td(m) < ubn{m) < A. Also a lower bound for 7 is derived. Each process keeps 
arrays to store timestamps of messages sent or received from other processes. 
Each process broadcasts every r clock time units so that all connected processes 
can update their timestamps for this process. Also it is interesting to note that 
fail-awareness sometimes allows a timely message to be delivered slow but a 
message with td > A is never allowed to be delivered fast! Measurements [4] 
have shown that by choosing r, A, S appropriately the calculated upper bound 
is quite close to the actual transmission delay and that the above mentioned 
properties hold. They are verified using PVS (see below). 
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2.2 Local Leader Election Service 

In partitionable systems, one wants progress to be made in all partitions to in- 
crease the availability of services. But excessive performance failures make it im- 
possible to elect a leader in all partitions. Hence the highly available leader elec- 
tion (HALL) problem)?] only requires that a leader is elected within a bounded 
amount of time in a stable partition 

The unstable partitions are split into “logical partitions” . A logical partition 
is a set of processes such that the local leader can communicate in a timely man- 
ner with all processes in the logical partition. The election of a local leader has 
to be supported by all processes connected to it. HALL creates logical partition 
so that 

— Logical partitions never overlap 

— Stable partitions are subset of logical partitions 

— Two local leaders are always in two separate partitions 

— Processes in one partition are not connected to the local leader of any other 
logical partition 

HALL election service uses an independent assessment protocol to approxi- 
mate the set of processes in a stable partition. The leadership is also renewed in 
a round-based fashion. A leasing mechanism to communicate by time is used to 
ensure that a process is at a time in at most one logical partition. 

Process p is A — disconnected from q in [s, t] iff all messages p receives 
in [s, t] from q have & td > A real-time units. A stableP artition SP is a A- 
partition that is defined as: A — partition{S P, s, t) = {SP ^ d>) A (Vp, q G SP : 
connected{p, q, s,t)) A (Vp G SP, \/r € P — SP : A — disconnected{p, r, s, t)) 

The goal of the HALL service is to elect one leader in each stable partition. 
Leader p{t) denotes if process p is leader at time t. supports* {p, q) denotes that p 
supports q’s election at time t. The HALL election service satisfies the following 
properties: 

1. (T) Timeliness: a local leader is elected after at most k time units: VSP C 
P, Vt : stablePartition{S P, t — k,t) ^ 3p € SP, 3s G [t — k, t] : Leaderp{s) 

2. (SO) Support atmost one: a process cannot support election of 2 different 
leaders at same time: Vt, Vp, q,r € P : supports* {p, q) A supports* {p, r) ^ q = r 

3. (BI) Bounded inconsistency: a process can be connected to 2 leaders simulta- 
neously for at most k time units: Vp, q, t : connected{p, q,t — k,t)A Leader* {p) ^ 
supports* (q,p) 

4. (LS) Leader self support: a leader supports self: Vp, t : Leader*{p) ^ 
supports* (p,p) 

SUPPORTS, the reflexive, symmetric and transitive closure of supports pred- 
icate, creates logical partitions. Each process uses a leasing (or locking) mech- 
anism to communicate by time in supporting the leader. The protocol involves 
various parameters like 

— aliveSetp-. set of processes from which p has received a fast message in the 
last expires clock time units 
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— expirationTime: a leader has to renew its leadership within this time 

— lockTime: the time a process p supports another process q 

The bounds for the constants k, expires, lockTime are derived [7] so that the 
above four properties hold. Also the fact these four properties solve the local 
leader election problem in partitionable systems is verified. 



3 Modeling and Verification 

TAS is modeled in PVS using its specification language. Theories like reals, rela- 
tions, f inite_sets etc already present in PVS are imported and the related lem- 
mas given in prelude . pvs are reused. For example, the theory relations is used 
to model the equivalence relation SUPPORTS while theory f inite_sets_def is 
used to specify logical partitions. A modular approach, allowed in PVS through 
the import of theories, is followed to model the complete system. 

The assumptions which the FADS and the HALL make are initially modeled 
as axioms. All the properties are formally verified using these axioms and the 
PVS prover commands and strategies [6]. Then the actual protocols for the FADS 
and HALL are modeled and the axioms assumed by the above services are proved 
to be correct. 

3.1 Modelling Time Dependent Variables 

In the leader election protocol, various variables e.g. lastRequestp, flags e.g. 
lockflapp and sets e.g. aliveSetp, replySetp, supportSetp, corresponding to each 
of the processes vary with time. For instance, the set replySetp of a process p 
is modified whenever a reply from any other process arrives supporting p’s elec- 
tion. Also lastRequestp is modified whenever p broadcasts an election message. 
lockflagq is modified when q locks to/supports p (in which case it is set to true) 
or if lockTime clock time units have expired after q locked to some process p (in 
which case it is set to false). 

But these variables or sets are reinitialised before the next round starts. 
Also, lockflag is sticky once true within a round. Thus we can remove their time 
dependencies by taking snapshots at the beginning and end of rounds. Since the 
properties needed to be verified hold true at the end of a particular round, we 
need not look at the intermediate values of those variables, flags and sets. 



3.2 Modelling Clocks Along with Approximations 

Most of the parameters involved in TAS are reals. But they are also constrained 
e.g. p lies between 0 and 1, which is captured through subtyping. 

x: VAR real 

rhotype: NONEMPTY_TYPE = {x I x>0 AND x<l}- 
rho : rhotype 
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Also “realtime” and “clocktime” are defined to be two separate types. The 
hardware clock Hp maps realtime to the clocktime for process p. Processes are 
of type natural number and message is a non-empty type. 

clocktime: TYPE = {x I x >=0} 
realtime : TYPE = {x I x >=0} 
proc : TYPE = nat 

H(p: proc)(rt: realtime): clocktime 
msg: N0NEMPTY_TYPE 

While trying to prove some lemmas and theorems, some implicit assumptions 
become apparent. They have been modeled as assumptions in PVS. For example, 
OnetoOne axiom states that the clocktime given by the hardware clock of any 
process corresponding to a particular realtime is unique. This is needed to model 
the strictly monotonous nature of the hardware clocks. 

OnetoOne: AXIOM (FORALL (p: proc, s,t: rt) : H(p) (s) = H(p) (t) IFF s = t) 

is equated to zero only in the derivation of upper/lower bounds for various 
parameters. Hence the following axiom is needed. Since PVS uses only linear 
decision procedures, no inconsistency or derivation of p = 0 is possible. 

RhoApprox: AXIOM rho*rho = 0 

Since is close to zero, (1 + p)~^ ~ (1 — p) and (1 — p)~^ ~ (1 -h p) where 
p is the maximum clock drift. To handle such approximations, another axiom 
is introduced in addition to the usual axiom for the correctness of the clock. 
correctHl can easily be mathematically verified using the above approximations 
and correctH axiom. 

correctH(p:proc) (u:realtime) : bool = 

FORALL (s :realtime ,t : realtime I s < t AND t <= u) : 

(1-rho) * (t-s) <= H(p) (t) -H(p) (s) AND (1+rho) * (t-s) >= H(p) (t) -H(p) (s) ) 

correctHl (p :proc) (u:realtime) : bool = 

(FORALL (s: realtime, t: realtime I s < t AND t <= u) : 

t-s <= (H(p) (t)-H(p) (s))*(l+rho) AND t-s >= (H(p) (t) -H(p) (s) ) *(l-rho) ) 

correct AX : AXI0M(F0RALL(p :proc ,u: realtime) : correctH (p) (u)=>correctHl (p) (u) ) 



3.3 FADS 

“Fail Awareness” which is the underlying assumption in FADS ensures that a 
message with transmission delay > A is never delivered as “fast”. Fail awareness 
itself is proved next from the datagram protocol specification. 

AXI0M(F0RALL(p : proc ,q: proc ,m:msg,t :rt) : f a_deliver (t) (q) (m,p, f ast) IMPLIES 
EXISTS (s :rt I s>=t-DELTA AND s<t) : send(s) (p) (m,q) OR broadcast (s) (p) (m) ) 

y, a slow message will never be delivered fast 

LEMMA 1 : LEMMA (FORALL (p:proc,q:proc,m:msg, s,t : rt) : 

(t > DELTA + s AND (send(s) (p) (m, q) OR broadcast (s) (p) (m) ) AND 
(FORALL (u:rt|u > s AND u < t) : NOT (send(u) (p) (m,q) 

OR broadcast (u) (p) (m) )) ) IMPLIES NOT fa_deliver (t) (q) (m,p,fast) ) 
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The upper bound on the one-way transmission delay (see Section 2.1) is 
calculated. From the underlying protocol, min-delay assumption and bounded 
clock drifts, one can prove that the delay is bounded by the calculated upper 
bound. 

ub(A,B,C,D: ct) : real = (D - A)*(l + rho) - (C - B)*(l - rho) - deltamin 

UBound: LEMMA (F0RALL(A,B ,C : ct , m:msg, p,q:proc, t:rt): 

(Send(A,B) (C,m) (p,q) AND Receive (A,B) (C ,t ,m) (q,p) AND correctH(p) (t) 

AND correctH(q) (t)) IMPLIES td(q) (m) <= ub(A,B ,C,H(q) (t) ) ) 

’/.fail awareness 

FAILAWARE: LEMMA (FORALL(p,q:proc,m:msg,t :rt) : f a_deliver (t) (q) (m,p,f ast)=> 
EXISTSCs: rt I s>=t-DELTA AND s<t) : send(s) (p) (m,q) OR broadcast (s) (p) (m) ) 

By making sure that A > calculated upper bound and by using UBound 
lemma above, the fail awareness property is proved. 

To prove the “Timeliness” requirement, the lower bound on 7 and a helper 
axiom are used. The axiom states that each process broadcasts messages every 
T clock time units and that the processes are connected. Non-linear arithmetic 
lemmas from real_props theory and approximations as listed earlier are also 
needed. 

y, helper: each timely process p beasts at least every tau clock time units 
AXIOM (FORALKs ,t : rt , p:proc I H(p) (t)-H(p) (s)>=tau) : Timely(p) (s) (t) => 
EXISTS (u:rt ,m:msg,C : ct I u>=s AND u<t AND H(p)(u)=C): Broadcast (C,m) (p) ) 

’/.lower bound for gamma 

GAMM: AXIOM gamma >= tau*(l + rho) + delta 
y, timeliness requirement 

TI : LEMMA (FORALL(p,q:proc ,m:msg, t :realtime It >= gamma): 

(connected(p.q) (t-gamma,t+DELTA) AND send(t) (p) (m,q) ) IMPLIES 
EXISTS (s: realtime I s <= t + DELTA): f a_deliver(s) (q) (m,p,f ast) ) 

The effort involved in proving fail awareness and timeliness from the pro- 
tocol properties is much higher compared to proving LEMMAl assuming fail 
awareness since the datagram level protocol involves timestamps and non-linear 
bounds for parameters 7 , A etc. 

3.4 HALL 

The HALL election service requires two invariants to be satisfied. Invariant INVO 
states that two processes connected for k real time intervals cannot both be 
leaders at the same time. INVl disallows the situation where there are two 
leaders in a trio of processes . 

y, timeliness condition T: 

AXIOM FORALLCSP : setof [proc] , t :rt I t>k) : stablePartition(SP) (t-k,t) => 
EXISTS (p:proc,s:rt I Liesln(s) (t-k,t) ) : SP(p) AND Leader(p)(s) 
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"/.support at most one SO : 

AXIOM FORALLCt :rt ,p,q,r :proc) : supports (t) (r ,p) AND supports (t) (r ,q) => p=q 
"/. bounded inconsistency BI : 

AXIOM FORALL(p,q:proc,t :rt I t>k) : 

(connected(p,q) (t-k,t) AND Leader(p) (t)) IMPLIES supports (t) (q,p) 

"/. leader supports itself LS: 

AXIOM FORALLCp :proc , t :rt) : Leader(p)(t) => supports(t) (p,p) 

INVO: LEMMA FORALL(p,q:proc ,t :rt I t>k AND p/=q) : 

connected(p.q) (t-k,t) IMPLIES NOT(Leader (p) (t) AND Leader(q) (t)) 

"/. invariant: p,q,r no two leaders can exist (after k time units) 

INVl: LEMMA FORALL(p,q,r :proc ,t :rt I t>k AND p/=r) : (connected(p,q) (t-k,t) 
AND connected(q,r) (t-k,t) AND DeIta-disconnected(p,r) (t-k,t) ) IMPLIES 
NOT (Leader(p) (t) AND Leader (r) (t) ) 

Logical partitions (logPart) are formed by the “supports” predicate. That 
logical partitions don’t overlap is implicit from the fact that SUPPORTS is 
an equivalence relation. Lemma OVERLAP states the fact. Also lemma LPLDR 
states that two leaders cannot exist simultaneously in a logical partition. Lemma 
SPLDR states that in a stable partition there can be at most one leader after a 
bounded time {k real time units). 

To prove that no two leaders can exist in a logical partition, we use the fact 
that two processes, p and q, are in the same logical partition iff there exists an 
undirected path from p to q. The predicates “undirected” and “directed” are 
defined for the “supports” relation. 

"/.equivalence relation on supports gives SUPPORTS 

AXIOM FORALLCp, q: proc , t : rt) : supports (t) (p , q) => SUPPORTS (t) (p , q) 

AXIOM FQRALL(t:rt) : equivalence? (SUPPORTS (t) ) 

AXIOM FQRALL(t:rt,p,q:proc) :SUPPORTS(t) (p,q) IFF logPart (t) (p)=logPart (t) (q) 

"/.there exists a directed path from a to b using supports predicate 
directed(a:proc) (b:proc) (t :rt) : INDUCTIVE bool= (supports (t) (a, b) OR 
(EXISTS (c:proc I c /= a) : supports (t) (a, c) AND directed(c) (b) (t) ) ) 
"/.undirected path from a to b in supports predicate 

undirected (a: proc) (b :proc) (t :rt) :bool= (directed (a) (b) (t) OR directed (b) (a) (t) 
OR(EXISTS(s :proc I s/=a AND s/=b) : (directed(a) (s) (t) AND directed(b) (s) (t) ) ) ) 

"/. p,q belong to same logical partition only if undirected path(p,q,t) 

AXIOM FORALL(p,q:proc,t:rt I p/=q AND logPart (t) (p) =logPart (t) (q) ) : 
undirected(p) (q) (t) 

"/. logical partitions don’t overlap 

OVERLAP : LEMMA FORALLCp , q,r : proc , t : rt I p/=qANDp/=r) : (logPart (t) (p) =logPart (t) (q) 
AND logPart (t) (p)=logPart(t) (r)) => logPart (t) (q) =logPart (t) (r) 

"/.there can be at most one leader in a logical partition 
LPLDR: LEMMA FORALLCp, q:proc, t :rt I p/-q) : (logPart (t) (p) ^logPart (t) (q) ) 
IMPLIES NOT (Leader(p) (t) AND Leader (q) (t) ) 

"/.after k time units there is at most one leader per stable partition 
SPLDR: LEMMA F0RALL(SP : set of [proc] ,t :rt I t>k) : stablePartition(SP) (t-k,t)=> 

NOT EXISTS(p,q:proc|p/=q) :SP(p) AND SP(q) AND Leader (p)(t) AND Leader (q)(t) 
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The HALL protocol uses a leasing mechanism to communicate by time in 
process q supporting p’s election. The support is valid only for lockTime clock 
time units. Lemma LOCK states that a process cannot be locked to two processes 
at a time, which is the invariant used to prove that a process can support at most 
one process at a time. Leader self support is proved from the fact that a process 
declares itself leader only if its aliveSet and supportSet are the same. Lemma 
HAl states that a logical partition is formed k time units after the formation of 
stable partition. Also a process cannot be connected to a local leader of some 
other logical partition for more than a bounded amount of time (k) which is 
stated by lemma HAS. 

7. leader until expiration time 
AXIOM FORALL(p:proc,t:rt) : Leader(p)(t) IFF 
(H(p) (t)<expirationTime(p) AND imLeader(p)) 

7o supports predicate definition: q supports p 

AXIOM FORALL(p,q:proc ,t :rt) : supports(t) (q,p) IFF (Leader (p) (t) AND 
member (q , supportSet (p) ) ) 

7o a process can be locked to at most one process at a time 
LOCK: LEMMA F0RALL(p , q,r :proc , t : rt I r/=p) : NOT (locked(t) (q,p) 

AND locked(t) (q,r) ) 

7o if q supports p, q is locked to p 

SUPPLOCK: LEMMA FORALL (p,q:proc, t:rt I t>gamma+2*DELTA) : 
supports(t) (q,p)=>locked(t) (q,p) 

7o support at most one SO 

SO: AXIOM FORALL (t :rt ,p,q,r :proc I t > gamma + 2tDELTA) : 
supports(t) (r ,p) AND supports(t) (r,q) IMPLIES p = q 
7o leader supports itself 

LS : AXIOM FORALL (p : proc , t : rt I t>gamma+2*DELTA) : Leader (p) (t)=>supports(t) (p,p) 
7. Within k time units after a stable partition is formed, a logPart formed 
HAl: LEMMA FORALL (t : rt , SP : setof [proc] ,p :proc I t > k) : 

(stablePartition(SP) (t-k,t) AND SP(p) AND Leader (p) (t) ) IMPLIES 
FORALL (q:proc |SP(q)): logPart (t) (q) = logPart(t) (p) 

7. processes in a partition not connected to local Idr of any other logPart 
HAS: LEMMA FORALL (p , q: proc , t : rt I t>k) : (Leader (q) (t) AND 

logPart (t) (p) /= logPart (t) (q) ) IMPLIES NOT connected (p, q) (t-k,t) 

The proof of lemmas INVO, INVl, SPLDR, LPLDR etc. from the axioms 
SO, LS etc. is quite straight forward. But to prove that the underlying protocol 
satisfies these axioms is more difficult. For example, consider trying to prove the 
axiom SO (support atmost once). After a stable partition is formed, process p 
broadcasts election request message to all the processes in its aliveset as soon 
as its own id becomes less than that of all processes in its aliveset. Process 
q, on receiving the election request, locks to process p if its lockflag is not set 
and p’s id is less than that of all processes in q’s aliveset. In this case q sets 
its lockflag to true for expires clock time units and sends a supportive reply to 
p. If p receives supportive reply from q, p adds it to its supportset. Process q 
supports p if and only if q belongs to p’s supportset. Process q cannot lock to 
two processes simultaneously (lemma LOCK). Hence it suffices to prove that q 
supports p implies q is locked to p (lemma SUPPLOCK) to prove axiom SO. For 
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this to be true, process q's lockflag has to be set at the time process p checks for 
<7’s support. 

One approach to proving lemma SUPPLOCK is to start with process p broad- 
casting an election request message. But with this approach, we need to able to 
prove that if lockflag is false and some conditions hold then lockflag is set to 
true which is quite difficult without temporal logic. A more suitable approach 
is to start with q supporting p. Hence p is the leader and q lies in the same 
partition as p. Also, q has received p’s election request message and locked to 
p and expirationtime being less than lockTime clock time units proves the nec- 
essary property. Thus the dynamic changing aspect of lockflag is taken care of 
appropriately. 

3.5 What Has Been Learned? 

An interesting behavior of the system, which is not obvious but becomes apparent 
in the verification process, is that properties like “LS” and “SO” are valid only 
after 7 -|- 2 * Z\ real time units of the formation of the stable partition. This is 
so because it takes at least this much time for a process to declare itself leader. 
This also follows from the bounded inconsistency property but is not explicit. 

Using PVS has helped us to uncover differing assumptions in the literature: 
for example, while FADS does not assume stable partitions, HALL assumes it. A 
casual reader might think that neither assumes it. The bounds derived for HALL 
assume the number of untimely messages as 0. We can generalize the results by 
giving connected one more parameter F (the maximum number of failures), using 
Z\-F partitions and deriving new bound for k in HALL (<= F * old k). 

The number of steps involved in the process of verification is (> 150) with 
the size of the generated proof file being (> 1000 lines). Non-linearity of bounds 
for various parameters and case splits add to the effort involved (around 1000 
lines of PVS code). 

4 Related Work 

In [5], a round-based synchronous algorithm modeled as a functional program 
was first transformed into an untimed system and then into a time-triggered 
system, in which bounds for various parameters were derived and the system was 
verified. It was also shown that the global states of the time-triggered system 
and the untimed system had a one-to-one correspondence. But the system we are 
considering is an event-based system where there is no bound on the transmission 
delays of messages. Also the notion of round in our case is different in different 
cases because of the complexity of the underlying protocol. For instance, the 
leader broadcasts election messages every election period, while a process locks 
to a leader for locktime interval and renewal of leadership takes place with a 
different time period. The assumptions in [5] include bounded drift and clock 
synchronization. Also the synchronous nature of the algorithm and the presence 
of rounds allows the implementation of the system as time triggered system 
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which simplifies its verification. In our case we deal with distributed algorithms 
in a partially synchronous system with a variable notion of a round which makes 
the modelling and verification more difficult. However, the calculation of bounds 
for the parameters in the algorithm is carried out in our case in a similar way. 
Faults in [5] were modeled by perturbation in functions. In our case the concept 
of Z\-Partition limits the number of message drops. 

5 Conclusion 

We believe that Timed Asynchronous Systems provide a better framework for 
distributed systems with its fail awareness and conditional timeliness guarantees. 
It solves the need for rejection of old messages and allows for essential services 
to be implemented even in partitionable systems. 

We have studied Timed Asynchronous Systems and two of its services (fail 
awareness and local leader election) from a verification perspective. We were able 
to formally verify the properties of interest in this system (the PVS source files 
can be accessed at http://drona.csa.iisc.ernet.in/ gopi/fsttcs2001/). Approxima- 
tions and time dependencies of variables have to be handled carefully. Verifica- 
tion of the system was useful and brought out many issues which were not quite 
obvious from the specification. 
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Abstract. We investigate average efficient adders for grid-based environments 
related to current Field Programmable Gate Arrays (FPGAs) and VLSI-circuits. 
Motivated by current trends in FPGA hardware design we introduce a new com- 
putational model, called the A -wired grid model. The parameter A describes the 
degree of connectivity of the underlying hardware. This model covers among oth- 
ers two-dimensional cellular automata for A = 0 and VLSI-circuits for A = 1 . 
To formalize input and output constraints of such circuits we use the notion of 
input and output schemas. It turns out that the worst case time and area complexity 
are highly dependent on the specific choice of LO schemas. We prove that a set 
of regular schemas supports efficient algorithms for addition where time and area 
bounds match lower bounds of a broad class of I/O schemas. 

We introduce new schemas for average efficient addition on FPGAs and show 
that addition can be done in expected time 0(log log n) for the standard VLSI 
model and in expected time 0{y/log n) in the pure grid model. Furthermore, 
we investigate the rectangular area needed to perform addition with small error 
probability, called area with high probability. Finally, these results are generalized 
to the class of prefix functions. 



1 Introduction 

To investigate the average time behavior of parallel systems we introdueed the notion of 
average-eomplexity for eireuits [16]. It turns out that for many basie Boolean funetions 
like addition, threshold, or eomparison there are speeial eireuits, that provide a substan- 
tial speedup in the average for a wide elass of probability distributions. This analysis 
was extended to a broader elass of funetions, the prefix funetions, in [17,12,13]. It is not 
known how Boolean eireuits with optimal average ease behavior ean be used in praetice. 
A straight-forward approaeh is to use these averagely effieient eireuits in asynehronous 
VLSI-design. However, new problems arise when it eomes to verifieation, effieient eom- 
bination, and plaeement of sueh eireuits. Note that the analysis of expeeted time and 
worst time of a given eireuit is a eomputationally infeasible task [14]. Determining these 
parameters is a preeondition to prediet hazards or to ealeulate loeal eloeks. Henee, it 
seems to be neeessary to design new eireuits refleeting the restrietions given by the 
VLSI-environment. 

A promising eomputational deviee for the implementation of average effieient eireuits 
are Field Programmable Gate Arrays (FPGAs). FPGAs evolved from programmable 
array logies whieh ean be eonfigured to eompute primitive Boolean funetions, e.g. by 
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defining clauses of a conjunctive normal form. By the time more and more computa- 
tional power was given to the basic logical units, called logic blocks, while the degree 
of chip integration increased their number. Programmable Gate Arrays introduced in [6] 
(see also [9,10,11] and [7,20] for surveys) can be used as an alternative to traditional 
hardware design: To implement the functionality of special processors it is cheaper and 
faster to configure an FPGA-chip, e.g. for mobile telephones or hand-held devices, than 
to make an Application Specific Integrated Circuit (ASIC). So, FPGAs extraordinarily 
reduce integrated circuit manufacturing time and prototype costs. The FPGAs’ universal 
layout allows the emulation of every specific hardware after appropriate configuration. 
The underlying concept is a sea of gates: a uniform array of logic blocks with a regular 
interconnection network. 

FPGAs of the latest generation, e.g. [1,2,23,24], provide up to some 60k free program- 
mable logic blocks, some 64k bit RAM, and a clock frequency of 100 MHz, which on 
first sight seems not much compared to the current generation of processors and mem- 
ory chips. But FPGAs are inherently parallel and fully programmable (some are even 
reconfigurable during computation). Furthermore, the integration of FPGAs has not yet 
reached the same level as of ASICs. 

FPGAs are used as co-processors for high speed designs by implementing a variety 
of compute-intensive arithmetic functions including FFT, discrete-cosine transforms 
(DCT), and interpolators [22]. Furthermore, they are useful for cryptographical and 
cryptoanalytical applications: In [3] it is suggested to use FPGAs for a brute force at- 
tack against the DBS crypto-system. More computational power is only achievable by a 
substantial longer development period and with higher costs. 

There are two basic approaches to design the intercormection of gates within an FPGAs: 
the routing and the mesh design. In the routing design logic blocks are arranged along 
the boundary of the chip and the interior is completely occupied by a routing network. 
In the mesh design logic blocks are positioned in a grid and each block is connected to 
its direct neighbors. There is a restricted number of long wires facilitating long range 
interconnections of logic blocks. The mesh design allows a greater number of logic 
blocks. On the other hand its interconnection structure is more restricted than in the 
routing design. 

In the mesh design the interconnection between logic blocks vary widely between man- 
ufacturers and between manufacturers’ chip series. Some offer connections to diagonal 
adjacent neighbors, arbitrary zigzag-lines, buses for cells on the same row or column, 
connections to the second next neighbor, connections to the forth next neighbor, and 
so forth. Mesh based FPGAs provide a subset of these connection structures while not 
every combination of these may be used in a certain configuration. Also there are dif- 
ferences in the computational power of the logic blocks, in the number of clocks, and in 
the distribution and amount of memory. For a general approach it is necessary to define 
a computational model which covers a wide range of different FPGA designs. 

In this paper we introduce a computational model for FPGAs, the A -wired grid model 
motivated by common FPGA design principles [7]. The parameter A addresses the 
amount of available long range interconnections. For A = 0 it corresponds to the model 
of two dimensional cellular automata. For A = 1 it describes the theoretical model of 
VLSI-circuits as discussed in [21]. If A is large enough it models FPGAs in the routing 
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design. Like in VLSI we investigate time and area as complexity measures in the A - 
wired grid model. 

The computational complexity of basic functions like sorting, addition, multiplication, 
and many more are well studied in the VLSI-circuit model [21]. One achievement was 
to find ultimate lower bounds for time and area based on the planarity of chips and the 
speed of light. But, matching upper bound results do not necessarily lead to practical 
VLSI-algorithms. To achieve practical solutions input and output compatibility has to 
be realized. Our approach to I/O compatibility is the notion of input and output schemas 
to determine the meaning of an input or output bit at a pin position at any time. 

We focus on the time and area complexity of the addition of long binary numbers as 
well as on arbitrary confluent prefix functions as introduced in [17]. For a specific input 
and output schema Brent and Kung [4] show that addition can be implemented in time 
O(logn) and area 0{n) . This refers to the area occupied by the gates and the wires. 
It implements the Ladner-Fisher carry-propagation network [18] for partially pipelined 
input, i.e. an input is partitioned into a sequence of parallel blocks of equal length. In 
this paper we show that this bound is best possible for a wide class of schemas, called 
compact schemas. These bounds are also optimal, if the input and output schema are not 
compact, but closely related and if A , the degree of connectivity, is arbitrarily high. 

We introduce new schemas for efficient addition on FPGAs and show that addition can 
be done in expected time O (log log n) for the standard (1 -wired grid) model and in 
expected time 0{y^logn) in the pure grid model. Furthermore, we investigate the size 
of a rectangular area that correctly performs addition with high probability, called area 
w.h.p. Following the ideas in [19] area w.h.p. can help reducing the over-all area, e.g. 
many adders using this smaller area share one large area consuming worst-case adder, 
which is only necessary for a small fraction of the inputs. 

This paper is structured as follows. In Section 2 we introduce the notion of I/O schemas 
and a computational model for FPGAs, the A -wired grid model. In Section 3 we present 
efficient schemas and prove tight bounds for the computational complexity of addition 
of two binary numbers. In Section 4 we focus on algorithms for the average case. 



2 The A -Wired Grid Model 

An algorithm on a FPGA occupies a rectangular area A = mi ■ m 2 consisting of mi x 
m 2 programmable gates ( 5 i,i)ie[mi],iG[m 2 ] , called logic blocks and an interconnection 
network connecting each logic block to four neighbor gates. Inputs and outputs may 
only occur at the 2 mi+ 2 m 2 frontiers (g*.o)iG[mi], (ff».m 2 -K)iG[mi] , ( 5 o.i)jG[m 2 ] > and 
{9mi+i,j)je[m2] (W denotes the set {1, .., n} ). For the interconnection structure we 
distinguish between the grid, where interconnections are allowed only between adjacent 
gates, and the A -wired connection network, where cells have a constant number of 
wires connecting them with distant neighbors according to the configuration of an FPGA 
connection network. The input and output occurs only on a subset of the frontiers, called 
pins. We will address the input pins by pi,. . . ,pr ■ The computation of a circuit is done 
in rounds (e.g. synchronized by a local clock). In each round a gate receives the output 
of its neighbors and computes a pre-configured function depending on these outputs and 
its state. 
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To obtain a high performance these algorithms should be integrated into a data flow. 
Therefore, it is very important that input and output time behavior is compatible. Pins 
may be used in a serial or parallel way to interchange input and output data. In general 
serial I/O helps to reduce the area needed for computation, while parallel I/O reduces 
the time needed for the whole computations. It is well known that a combination of 
these communication patterns can provide both qualities [4]. Our formalism for I/O 
compatibility is called input or output schema. We assume a unit time model for the 
cycle time, i.e. at each time step a new bit may arrive at each pin. 

Definition 1. An input or output schema (I/O schema) P C [n] x [r] x IN of a vector 
Xi, . . . ,Xn for pins pi, ... ,pj. is a set of triples (/, j, t) , that describes that input bit 
Xi is available at pin pj at time point t. For a schema P and pin pj we define the start 
time Tg and the final time Tf by Tg{P,j) := miu(i j /, Tg{P) := min(j ^ j)gp t, 
Tf{P,j) ■■= and Tf{P) := max(jj_t)gP / . 

The grid model, defined below, is the basic computational model for FPGAs in the mesh 
design (see Fig. 1). It is similar to two-dimensional cellular automata. 

t = 2 X2 X5 xs Input 

t = 1 Xi X4 Xj 

t = 0 Xq Xs Xq Xg 

PO PI P2 P3 

Pins 

Frontiers 
Logic Blocks 

”'o ”'l ”2 ”3 

V2 VS VS t = A, + 2 

S/1 S/4 S/7 t = A^ + 1 

S /1 s/3 s/6 s/9 t = Output 

Fig. 1. The Grid-model 
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Fig. 2. The A -wired grid model 



Definition 2. An algorithm in the grid model consists of an mi x m 2 gate array 
{ 9 i.j)i€i[mi],jci[m 2 ] and frontiers (,9i ]/ (t/o.i )i^ [ 7 / 12 ] s and 

{9mi+i,j)j^[m2] which are grid like connected by E 

{{9^-l,3^9^,j} I j G [hi-i], i&[mi + 1]} U {{gij_i,gij} \ j G [m2-f 1], i G [mi]} . 
In the configuration phase each logic block gi j can be configured to implement any 
function hij : i for a finite alphabet E . The computation phase starts at 

time Q . At every time step t j j.t) G E^ describes the state 

of a gate gij . The internal state is given by Vij^t ■ Each of the other four values is 
transmitted to the corresponding neighbor. Unless gates are pins, for t = 0 all values 
are initialized with a special symbol (3 . This symbol j3 also describes the values of all 
non-pin frontiers. In each round / > 0 the following computation takes place 

■■= /ipi (Ws.pt, • 
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The values of the pins are determined by the input string and an input schema. Unused 
positions of the I/O schema are indicated by the special symbol f3 . 

FPGAs in the mesh design have an additional number of long wires for fast interconnec- 
tions. The potential of high functionality of logic blocks results in larger area and slower 
time compared to a hardware layout of the same functionality. In FPGA design this is 
compensated by the number of wires which can be interconnected by switches. Such a 
switch connects the ends of four wires n. e, s, w named by the cardinal directions. It 
features the following three configurations, which describe all three possible two-by-two 
connections, i.e. {{n, s}, {e, w}} , {{n, e}, {s, w}} , and {{n, w}, {s, e}} . To simplify 
our model we assume bidirectional communication along the wires. 

Definition 3. For a switch or logic block p let p’^ be its port to the cardinal direction 
d G {n, e, s, w} . For a port p^ let v{p'^, f) G S be the input value received by p from 
direction d at time t and w{p’^,f) G S the corresponding output value. If two ports 
p'^ and q‘^ are connected we have v{p'^, f) = w{q‘^ , f) and w{p‘^, f) = v{q'^ , f) . 

There is no common design principle in practice for the use of switches. For an overview 
see [7] . It turns out that many mesh based FPGA designs support the A -wired connection 
model which is defined as follows: 

Definition 4. The A -wired connection network connects logic blocks gtj for i G 
[mi],j € [m 2 ],d € {n, e, s,w} with the switches Vi,j,k and Dij^k for i G 

[mi — 1] , j G [m 2 — 1] , k G [A] . The definition of the connections E\ follows 
analogously to Fig. 2. For a configuration C of all switches, the resulting connection 
network E\{C) describes a matching of all ports of logic blocks gij . 

In practice there is a variety of connection networks for FPGAs which essentially provide 
similar features like the A -connection network. The parameter A varies for different 
FPGA designs, e.g. on the Xilinx XC4000E one can directly implement a 10 -connection 
network and for the Xilinx XC4000X one can achieve A = 13 (The switches used in 
the Xilinx architecture correspond to our model). 

Note that the parameter A also describes the relationship between size of logic blocks and 
area needed for wires, since for many hardware designs the area allocated for logic blocks 
is approximately as large as the area needed for wiring. We combine this connection 
network with the grid model: 

Definition 5. An algorithm in the A -wired grid model consists of an m± x m 2 gate 
array and a corresponding A -connection network. In the configuration a function hij : 
^ E is assigned to the logic block gij and a configuration C of the network is 
chosen. The computation proceeds analogously to the computation in the grid-model 
except that in each round t > 0 the following computation takes place 

\qi,3,t+i,w{glj,t + l),w{gt^j,t + l),w{glj,t + l),w{glj,t + 1)) := 

h^,l{q^,j,u v{9lj,t),v{glj,t), v{gf .j,t),v{glj,t))) . 

For A = 0 the A -wired grid model is equivalent to the grid model. An algorithm in the 
A -wired grid model computes a function / w.r.t. FO schemas Pi and Pq if for every 
input X given in accordance with Pi it outputs f{x) using schema Pq ■ 
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Unlike for machine models like Turing machines or directed acyclic circuits it is not 
clear which notion of time for computation is most important for FPGAs. We use the 
following notions: The over-all time is the number of all time steps needed to compute 
the function, i.e. the time difference between the first input appears and the last output 
is delivered. The latency Ag is the time difference between the first given input and 
the first valid output. The follow-up time 4/ is the time difference between the last 
available input bit and the last output. Of course the standard measure is the over-all time. 
The follow-up time is the decisive measure for pipelines consisting of many sub-circuits. 
The latency gives a measure for the best possible speedup, e.g. if another algorithm A 
waits for the outputs of a computation and A ’s computation depends only on one input 
element. A ’s computation may start even before the preceding computation (with small 
latency) is finished. To be able to take advantage of such effects we assume that there 
are acknowledgement mechanisms as presented in [8,16]. 

Since we consider the (potentially long) wires and the mechanism controlling the I/O 
of an algorithm as a valuable resource we prefer schemas that use every pin in every 
step of a time interval. We call such a schema 1 -compact, or simply compact. If only a 
fraction of - of all input (or output) possibilities is used during the schema’s term, we 
call the schema c -compact. 

Definition 6. An schema P with n elements using r pins is called c -compact, if 
r(Tf(P) — Tg{P) + 1) < cn. We call 1 -compact schemas compact. A schema P is 
called a one-shot schema, if every input bit Xi occurs only once in P . 

Such a rigid notion of input and output behavior is not suitable for average-efficient 
circuits where some outputs may occur earlier. We allow earlier outputs if it does not 
disturb the chronological order of outputs at a pin. 

Definition 7. A relaxed schema RP{P) C 2 Mx[™]x'n ig q gchgifid p and 

a set of functions f € P with f : [m] x IN — >■ IN strictly monotone increasing w.r.t. 
time, the second parameter. 

P' g RP(P) g ^ . p/ = I g P} . 

A 5 -relaxed schema RPs{P) C is a relaxed schema where the set of 

functions T is restricted by f{j,f) < t-\-6,forall j,t. 

There are situation where it is reasonable to allow an output schema to change the 
topological and chronological order at the output pins. We call these kind of schemas 
general relaxed schemas as a specific union of schemas (we restrict ourselves to rea- 
sonable schemas, where the position and time of the transmitted string can be easily 
determined). 

3 Efficient Schemas for the Addition 

3.1 Definitions 

First of all, we assume that for given binary input numbers a,b G {0, 1}" bits 
and bi arrive at a logic block at the same time. In the next step this logic block com- 
putes Xi G {pro, gen, del} , using the mapping (0, 0) i-A del, (0, 1) pro, (1,0) 
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pro, (1,1) i— > gen, called the {pTo^gen, del} -representation of a and b. Then 
the standard parallel prefix operation for addition has to be computed on the vec- 
tor Xi to determine the result of the addition. For this we define the operator 0 : 
{pro, gen, del}* — >■ {pro, gen, del} by ®(e) = pro for the empty string e: and 
for any string w G {pro, gen, del}* define pro) := ®(w) , (Si{w gen) = 

gen, and ®(r/; del) := del . The standard parallel prefix operator on an input x G 
{pro, gen, del}" is defined by PP 0 (at) := yi ■ ■ .yn where yi = ®(a;i . . .Xi) . The 
sequence (t/i)iG[n] directly describes the sum a-\-b. 

Given the precalculated vector (xj) we investigate the time and area complexity of 
the schemas bit-parallel word-serial (or WS for short), bit-serial word-parallel (or WP 
for short), bit-serial snakelike word-parallel (or SWP for short), snakelike bit-parallel 
word-serial (or SWS for short), and offset bit-parallel word-serial (or OWS for short) 
(see Fig 3). For converting a given schema into another schema for example WS into 
WP see e.g. [5]. 




Fig. 3. Graphical description of the schemas: from left to right a) bit-parallel word-serial, b) bit- 
serial word-parallel, c) bit-serial snakelike word-parallel, d) snakelike bit-parallel word-serial, e) 
offset bit-parallel word-serial, and f) relaxed bit-serial snakelike word-parallel. In these diagrams 
the time is growing from the bottom to the top. 



Definition 8. Let x\, . . . ,x„ be the input or output string. For r pins and t = 
we define the following schemas. 

r X t-WS := {(i, |"|] ,1-F(i — 1) mod f) | z G [n] } (1) 

r X t-WP := { (z, 1 -F (z — 1) mod r, [}] ) | z G [zz] } (2) 

r X t-SWP := { (z, 1 -F (z — 1) mod r, [}] ) | z G [n] and { is odd} U 

{ (z, r — (z — 1) mod r, [}] ) | z G [rz] and { is even} (3) 

(r, t, d)-SWS := { (z, 1 -F [g - l] mod r, (z - 1) mod d -F d |"^] ) | z G [zz] (4) 
and \ ^] is odd} U { (z, 1 -F [g — l] mod r, 

(z — 1) mod d -F I" ■ d) \ z G [zz] and \ ^^] is even} 



r -OWS is a general relaxed schema with r pins with starting times Ts(P,j) (5) 
and lengths tj := Tf{P,j) — Ts{P,j) -F 1 , where zz = tj . The i th non- 
empty element at pin pj corresponds to the element Xk for k = X]i/< j + * ■ 
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3.2 Upper Bounds 

Theorem 1. In the grid model the addition of two n bit numbers with input schema Pi 
and output schema Pq can be computed in 

a) A G 0{n),T G 0(r + t), Ag, Af G 0{r + f) for r x t -WS Pj, Pq , 

b) A G 0(r2),T G 0{r + t),Ag,Af G 0(r) for r xt-SWP Pi,Pq, 

c) A G 0{dr + r'^),T G 0{d + r + t), Ag, Af G 0{d + r) for (r,t,d) -SWS Pi,Pq , 

d) A G 0(r logt) , T G 0{r + t) , Ag = 1 , Af G 0(r + t) for r x t -WS Pj and 
r -OWS Po ■ 

Sketch of the algorithms: For all algorithms pins are located only in {go,j)[m 2 ] ■ 

a) The algorithm for the WS I/O schema consists of three phases: In the first phase the 

algorithm stores the input of each pin pi by using a FIFO-data structure Qi . Further- 
more, it applies ® to all inputs received at pi yielding Zi = ■ Then, it 

computes PP 0 (zi . . . Zr) ='■ k\ . . .kr . Finally, it generates the output for each pin pi 
by applying the prefix operator PP® with initial value kt-i to the elements in Qi . 
d) We use a similar strategy for the OWS output schema. In first phase the algorithm 
counts the number Cj of leading pros at each pin pi . While this counting takes place 
no output is generated at pi . If the algorithm receives a non-propagate symbol at pi , it 
generates the output on-line at pi determined by the rest of the input on pi . As in the 
WS output schema, it computes the partial results Zi . In the second phase it computes 
PPigi(zi . . . Zr) ='■ ki . . . kr ■ Finally, it generates Cj+i copies of ki on pin pi . 

b) For the SWP I/O schema we pipeline the input through an r x 2r -field. For the first 

row we define the internal values = Xi , which appear according to the SWP schema. 
In the t -th row we calculate z* = zlzl 0 . Note that diagonal connections can be 

simulated by an appropriate encoding, e.g. increasing the alphabet S . In the last row of 
the field no values pro remain unless the input contains a pro-chain of length at least 
2r . If = pro then the last non-pro value that was computed at this logic block 

2 r determines the output value yt . Using standard techniques one can fold this field 
such that inputs and outputs are placed on the same side of the array. 

c) The algorithm for the SWS I/O schema is a straight-forward combination of the 

algorithms for the SWP and the WS schema. I 

For the A -wired grid model we omit the WS and the SWS schema since they do not 
provide more efficient algorithms than the WP and the OWS schemas neither in the 
worst case nor in the average case. 

Theorem 2. For A > 0 in the A -wired model addition of two n bit numbers can be 
performed with input schema Pj and output schema Pq as follows: 

if Pj and Pq are r x t -WP schemas then A G 0(min(n -F ” , rlogr)) , 

T G 0{t -F logr) , and Ag.Aj G O(logr) . 

2. if Pi is a r X t -WS and Pq is a r -OWS schema then A G 0{r log t -F ^ 

T G 0{t -F logr) , Ag G 0(1) , and Af G O(logf) . 

The algorithms for the A -wired model corresponds to the grid-based algorithms except 
that we use a tree structure as described in [4] to compute the carries of parallelly 
received input elements (for the WP schema) or to compute PP,gi(zi . . .Zr) (for the 
OWS schema). The algorithm presented in [4] pipelines through a tree structure to 
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improve the area. For the OWS schema we use the carry computation tree only once 
to compute PP 0 ( 2 i . . .Zr) ■ Hence, we can reduce the area needed to implement the 
tree structure by a factor of A . This does not help for the WP schema since we cannot 
compress information processed in the pipeline. The following corollary improves the 
AT^ bound of O(nlog^n) shown in [4] for high connectivity parameter A (llogn:= 
log log n ). 

Corollary 1. In the log n/ Hog n -wired-model the addition of two n bit numbers with 
input schema n/ log n x log n -WS and output schema nj log n -OWS needs area A = 
0{n Hog n/ log n) and over-all time T = 0(log n) , thus AT"^ = 0{n log n Hog n) . 

3.3 Lower Bounds 

Here, we investigate the question whether improved PO schemas exist which enable 
more efficient adders than those presented in the previous section. The only restrictions 
are that either the input and output behaviors are similar or that both schemas are compact. 
The two following theorems show that indeed the presented schemas and algorithms are 
optimal w.r.t. time and area. Since both Theorems use similar arguments we present a 
proof sketch after Theorem 4. 

Theorem 3. In the grid model any algorithm adding two n bit numbers using r pins 
and schema duration t has an over-all time of at least ^ + ^/r for any input schema 
and t + r for any one-shot compact input schema. Furthermore, it has 

1) at least area f?(min(r^, n) — Sr) for any one-shot compact 6 -relaxed input 
schema and relaxed output schema of the same type; 

2) at least area f 2 (min(r^ , n))for any one-shot compact input schema and any one- 
shot compact output schema; 

3) at least area I2(n) for a > 1 and 2at < y/n for any one-shot compact input 
schema and one-shot a -compact output schema. 

Theorem 4. For the A -wired grid model any algorithm adding two n bit numbers 
using r pins and schema duration t the over-all time is at least ^ + log r for any 
one-shot compact input schema. Furthermore, it has 

1) at least area I7(min(r logr, n) — Sr) for any one-shot compact S -relaxed input 
schema and relaxed output schema of the same type; 

2) at least area I7(min(r logr, n)) for any one-shot compact input schema and any 
one-shot compact output schema; 

3) area fl{n) for a > 1 and 2at < logn for any one-shot compact input schema 
and one-shot a -compact output schema. 

Proof Sketch; I.: Let Pi be a one-shot compact input schema and Pq G RP{Pi) . Let 
A := {i\3j : (z,j,0) G Pi and |{A: < z | 3j : (k,j,0) G P}| > r/2 - 1 } 

B := {i\3j,£: {i,j,i)GPi and £>t/2} 

B, := {i\3l-. (i,y,f)GP/}nH. 

Note that |H| > | . For z G H define Cj := { j ^ H | V/c G {j -F 1, .., z} : k ^ B} . 
We choose Xi := pro for all z G H and we select for all z G H the value of 
Xi G {del, gen} such that the inputs in B has a high Kolmogorov complexity. 
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Note that each output yi with i G A either depends on an input bit in B or on a pro- 
chain Ci containing at least | input elements received by the algorithm at the beginning 
on different pins. Hence, each of these outputs is delayed by at least min{log(r/4), t/2} 
steps, resp. by min{r/4, t/2} steps in the grid model. This implies that every output 
element at this pin is delayed by at least this number of steps. Therefore, also the output 
elements corresponding to Bi will be delayed and have to be stored. Because of the 
high Kolmogorov complexity of the input sequence addressed by B the algorithm 
needs a memory of size at least r/2 • min{log(r/4), t/2} for the A -wired model, resp. 
min{r/4, n/4} for the grid model. 

A S -relaxation of the input schema w.r.t. a compact schema P reduces this bound by 
Sr, because elements addressed by B can be delayed by S steps in the input schema. 
2. and 3.: Now let Pi, Pq be one-shot compact schemas and < Z 2 < • • • < v be 
the sequence of indices of the input which arrive at time step t — 1 . Define 
A-.= {ii,. . . ,ir}, Bi-.= {i\ii<i<^r/2\}i B 2 ■■= {i \i\r/2\+l < i < ir} 

Bi if|HinCfe| < iSanCfcl 



Cfc := {* I : (i,j,i) G P/Af < fc}, := 



B2 if|PinCfe| > iPanCfcl . 



For all i G Dk we choose Xi = pro and for all i ^ Dk we choose Xi G {del, gen} 
such that this sequence has a high Kolmogorov complexity. Note that the last position yi 
in Dk can only be determined after t-|-log(r/2) steps. Assume that Pq is a -compact 
with 1 < a < for an appropriately chosen k . Then, any algorithm cannot 

generate the first output element before time step t + log(r/2) — a ■ t > k . Hence, the 
algorithm has to withhold the output corresponding to Ck ■ Note that Ck addresses the 
input elements received within the first k steps. Note that at least half of these are not 
in Dk and possess high Kolmogorov complexity. This implies Ag I7(min(n, r ■ k)) . 
By choosing k = log(r/2) -F (1 — a)t the claim follows. I 



4 Efficient on the Average 

In this section we investigate average bounds for over-all time and follow-up time. 
Furthermore, we measure the area necessary with high probability (area w.h.p) defined 
as follows. Consider a partition of the over-all area of an algorithm into two rectangles 
Po and Pi with area Aq and Ai where all pins are adjacent to Pq . The algorithm 
needs area Aq w.h.p. if for a random input of length n the communication between 
Po and Pi consists only of special symbols (3 with probability 1 — for some 
constant c . 

Synchronizing output elements is an area-consuming task. For this reason, we use relaxed 
output schemas allowing results to be given out as soon as they are available. However, all 
of the here presented algorithms except the OWS schema produce compact (non-relaxed) 
output schemas w.h.p. 

Lemma 1. For any c > 0 the probability that the {pro, gen, del}-representation 
W of two uniformly chosen binary numbers A, B G {0,1}" contains a contiguous 
propagate sequence of length 2(c + 1) • log 3 n is bounded by . 

The algorithms used for averagely efficient addition are similar to the algorithms as 
presented in section 3 except that they generate outputs as soon as they are available. 
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Intuitively, Lemma 1 implies that for the algorithm for the SWP sehema presented above 
it is sufficient w.h.p. to use a r x 0(log n) -field. For the OWS schema the counters Cj 
can be bounded by 0(log n) w.h.p. For the relaxed WS and OWS output schema the 
algorithm has to resolve pro -chains zi ... Zn , which are shorter than w.h.p. 



Theorems. In the grid model the addition of two n bit random bits with input schema 
Pj and output schema Pq can be computed within time T = t + Af and: 



Pi 


Po 


E[^f] 


A w.h.p. 


r X t-WS 


rel. r x t-WS 


0{t+Pf^) 


0{n) 


r X t-WS 


r-OWS 


0(min(logn, / -F 


0{r min(llog n, log t)) 


r X t-SWP 


rel. r x t-SWP 


0(min(log n, r)) 


0{r min(log n, r, t)) 


{r,t,d)-SWS 


rel (r, t, d)-SWS 


0{d + min(r, ^^)) 


0{r min((i -F r, log n, /)) 



It turns out that the algorithm addressed by the following Lemma is a basic building 
block for the design of average efficient adders for A -wired grids. 



Lemma 2. There exists an O(nllogn) area and O(llogn) time bounded algorithm in 
the 1 -wired grid model that computes the addition of two n bit random numbers w.h.p. 
if the input and output is given in parallel ( n x 1 -WP). 

Proo/ 5kefcfi.- We partition the input into blocks Ui := of length 

£ := clogn for an appropriately chosen constant c and compute PP 0 (C/iC/i+i) for 
each i using the algorithm of [4]. Let . . . , y(i+ 2 )-e be the suffix of the result 

of this computation. Lemma 1 implies that j/i , j/„ is the correct result whp. I 

Using the worst-case algorithms one can improve time and area by replacing all sub- 
routines computing PP 0 by the algorithm of Lemma 2. To guaranty correctness if a 
propagate-chain of length c log n -F 1 occurs, the computation is delayed in such a case, 
and a worst-case algorithm takes over for the rest of the calculation. 

Theorem 6. In the \ -wired model adding two n bit random numbers with input schema 
P[ and output schema Pq can be computed in over-all time T = t -\- Af , where 
E[Af] G 0(min(logn,t-Fllogn)) , Ag Q(r(min(log f, llogn) -F ^,hp 

for r X t -WS Pi and r -OWS Pq resp. 

E[Af] G 0(min(logr, llogn)) and A G 0(r min (log r, Hog n, t -F *^*‘^^" )) whp 
for r X t -WP Pi and rel r x t -WP Pq . 

For a generalization, we observe that only the algorithm constructed for the OWS output 
schema uses a special property of the addition. All other bounds, presented here, can be 
generalized to a broader class of prefix functions. The worst case upper bounds presented 
in Section 3 holds for any prefix function. The average bounds presented in Section 5 
hold fora sub-class of these functions, the so-called confluent prefix functions, introduced 
in [17]. Furthermore, the algorithms provide the same average complexity measures if 
the input probability distribution is generalized to binomial approximable distributions 
as dicussed in [13]. The lower bounds presented in Section 4 also apply for diffluent 
prefix functions introduced in [13]. Because of space limitation we omit the definitions 
of diffluent and confluent prefix functions. 

As already noted, the algorithms for the offset schemas cannot be applied directly to 
general prefix functions. The following Theorem shows that there exists an alternative 
average efficient algorithm using this I/O schema for confluent prefix functions. 
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Theorem 7. In the grid model the computation of confluent prefix function for a random 
input of length n with input r x t - WS input schema and r -offset output schema can 
be computed within area w.h.p. 0{rmm{^/k)gn,'/i)) and expected follow-up time 
0(min(log n, t fhg \ -wired grid model the corresponding area bound 

w.h.p. is 0(r (min (i/log n, s/t) + while the expected follow-up time is 

0(min(log n, t + Hog n)) . 

The key idea of the proof is to replace the counters of the corresponding adders by an 
area efficient data structure for storing intermediate values. 

5 Conclusions 

The results presented above can be used to optimize the average behavior of an FPGA if 
an appropriate schema is chosen. Table I summarizes the results of Theorem 5. In Table I 
we optimize the time implied by the output schema. If schemas have equal asymptotic 
expected time behavior we present the one with smaller area. The expected follow-up 
time of the relaxed SWS schema is minimized by choosing d = min(-v^logn, r, f) . 
From Theorem 6 it follows for the A -wired grid model there is no asymptotical difference 
in the expected over-all time E\T] = t E[Af] for the relaxed WP and OWS output 
schema. Only for t G o(logn) and connectivity parameter A G ic(l) the OWS schema 
provides a more area efficient algorithm than the relaxed WP. For other parameters it 
seems that the relaxed WP schema should be preferred. Note that the corresponding 
algorithm produces a compact (non-relaxed) WP output schema w.h.p. 



Table 1. Output schemas which allow expected time efficient algorithms for grids. 



r G O(Vlogn) 


r G w(Vlogn) no( A—) 

\/log n 


r € I2( " — ) n o(n) 

y/log n 


r G Q{n) 


rel. SWP 
rel. SWS 


rel. SWS 


OWS 


rel. WS, rel. SWP 
rel. SWS, OWS 


E[Af = 0{f) 
A G O(r^) 


E[Af] G 0(s/logn) 

A whpG 0(r min(log n, r, t)) 


ElAf]GOC-2p) 
A whpG 0(r logt) 


E[Af \ G O(logn) 
A G 0{n) 
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Abstract. Consider a directed graph G = (U, E) with n vertices and a 
root vertex r £ V. The DMDST problem for G is one of constructing a 
spanning tree rooted at r, whose maximal degree is the smallest among 
all such spanning trees. The problem is known to be NP-hard. A quasi- 
polynomial time approximation algorithm for this problem is presented. 
The algorithm finds a spanning tree whose maximal degree is at most 
0{A* + logn) where, A* is the degree of some optimal tree for the 
problem. The running time of the algorithm is shown to be 
Experimental results are presented showing that the actual running time 
of the algorithm is much smaller in practice. 



1 Introduction 

The minimum-degree spanning tree (MDST) problem for an undirected graph 
G = (V,E) is that of constructing a spanning tree of G, whose maximal degree 
is the smallest among all spanning trees of G. It is a generalization of the Hamil- 
tonian Path problem and thus is also NP-hard. The problem can be defined on 
directed graphs as follows. Given a root vertex r £ V , find an incoming (or 
outgoing) spanning tree rooted at r, known as a branching, in which the maxi- 
mal indegree (outdegree) of a vertex is minimized. We will refer to the directed 
version of the MDST problem as the DMDST problem. In the Steiner case, a 
set of terminals D C V is also specified. The output tree must span the set D. 
However, it may contain any of the other vertices of G. 

The general problem of computing low-degree trees is both fundamental and 
finds ready applicability, such as in noncritical broadcast and VLSI layout prob- 
lems. It is also inherently appealing due to its seeming simplicity. Previous 
polynomial-time algorithms [2] for the DMDST problem find trees whose de- 
gree is at most 0{A* logn), i.e., a factor of logn from the optimal degree A*. 
On the other hand, the algorithm in [4] for undirected graphs finds a tree whose 
degree is at most A* + 1, i.e., an additive constant 1 away from the optimal. 
Our work tries to bridge this gap in performance of approximation algorithms 
for the undirected and directed versions of the MDST problem. 

* Research supported by the National Science Foundation under grant CCR-9820902. 
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Our algorithm for the DMDST problem finds a tree whose maximal degree 
is at most cA* + [log^n] = 0{A* + logn), for any constant c > 1. The ap- 
proximation quality has therefore been improved from a multiplicative factor of 
log n to an additive term of log n. The running time of the algorithm is shown 
to be quasi-polynomial, 0(n*°®‘= However, it is conjectured that a better 

bound may be provable, and also that the running time may be much better 
in practice for this reason. We present experimental evidence to show that in 
practice the algorithm ran much faster than the theoretical bound obtained here. 
Also, the degree of the tree output is often very close to the optimal degree. 
Previous work in undirected graphs. The first result on approximating a 
minimum-degree spanning tree was that of Fiirer and Raghavachari [2]. They 
gave a polynomial-time approximation algorithm that returns a tree whose de- 
gree is at most 0{A* logn). The first polynomial-time approximation algorithm 
for the Steiner version of the problem was provided by Agrawal, Klein and 
Ravi [1]. Their approximation ratio is 0(log|D|), where D is the set of ter- 
minals. Ravi, Raghavachari and Klein [12] studied generalizations of the MDST 
problem and gave quasi-polynomial time approximation algorithms. Fiirer and 
Raghavachari [4] improved their previous results and provided a new polynomial- 
time algorithm to approximate the MDST problem to within one of optimal. 
Their algorithm also extends to the Steiner version of the problem, but only 
works on undirected graphs. A survey of these and other minimum-degree prob- 
lems has appeared in a book on approximation algorithms [9] . 

Ravi, Marathe, Ravi, Rosenkrantz and Hunt [11] proposed algorithms for 
computing low-weight bounded-degree subgraphs satisfying given connectivity 
properties. Given a graph G with nonnegative weights on the edges, and a degree 
bound A, their algorithm computes a spanning tree of G whose degree is at most 
0{A log 2 ) Etnd whose weight is at most 0(log n) times the weight of a minimum- 
weight tree with degree at most A. Their techniques extend to the Steiner tree 
and generalized Steiner tree problems with the same ratio. They also studied 
special cases when the edge weights satisfy the triangle inequality and presented 
efficient algorithms for computing subgraphs that have low weight and small 
bottleneck cost. More recently, Konemann and Ravi [7] have given an algorithm 
that finds a tree of degree 0{A + logn) whose cost is at most 0(1) times the 
cost of an optimal tree of degree A. 

Given a graph G and an independent set of nodes I, the problem of finding 
a spanning tree that minimizes the maximum degree of any node in / is solvable 
in polynomial time [8]. Gavish [5] formulated the MDST problem as a mixed 
integer program and provided an exact solution using the method of Lagrangian 
multipliers. Ravi [10] presented an approximation algorithm for the problem of 
finding a spanning tree whose diameter plus maximal degree is a minimum. 
Previous work in directed graphs. Fiirer and Raghavachari [2] showed that 
their algorithm for computing low-degree trees further generalizes to find branch- 
ings in directed graphs. Their algorithm builds a tree in stages by taking the 
union of a sequence of low-degree forests (e.g., matchings), and the degree of the 
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resulting tree is shown to be 0(A* log ri). This paper improves the performance 
from a multiple of log n to an additive term of log n. 



2 Definitions and Notation 

The input is an arbitrary directed graph G = (V,E), and a root vertex r G V. 
Let n be the number of vertices in G. It is assumed that r is reachable from all 
vertices of G. Let T* be an optimal DMDST whose maximal degree is A*. 

A branching rooted at r is a subgraph of G whose underlying undirected 
graph is a spanning tree such that it has a directed path from any vertex to r. In 
a branching, each vertex other than r has exactly one outgoing edge and r has 
no outgoing edges. It is easily shown that the only subgraphs with n — 1 edges 
in which there is a directed path from every vertex to r is a branching rooted 
at r. Sometimes this is also known as an in-branching. One can also define an 
out-branching, in which r can reach every vertex of G through directed paths. In 
this paper, a branching always refers to an in-branching. However, our algorithm 
can be easily modified to find out-branchings with small outdegree. 

Let T be a branching. For each edge (v, w) in T, we call w as the parent 
of V, denoted by p[v\. Since every vertex except r has a unique outgoing edge, 
each vertex has a unique parent, and r has none. The reflexive and transitive 
closure of the parent function yields the ancestor relation. In other words, v is 
an ancestor of u if there is a directed path in the branching from u to v. We call 
u a descendent of u if u is its ancestor. We say that two vertices v and w are 
related if either v is an ancestor of w, or vice versa. Otherwise, we say that the 
vertices are unrelated. For any two unrelated vertices v and w, the least common 
ancestor is the ancestor closest to v that is also an ancestor of w. We define 
to be the set of all vertices in the subtree rooted at v, i.e., the set of all vertices 
including v, for which v is an ancestor. 

The degree of a vertex in a given branching is the number of edges coming 
into that vertex. We may also refer to it as its indegree. For a branching, let Sa 
be the set of all vertices whose degree is A or more. The degree of a branching 
is the maximum degree of any of its vertices. Our goal is to find a branching of 
as small a degree as possible. 

3 MDST Problem: Directed vs Undirected Graphs 

Our algorithm is based on an algorithm proposed by Fiirer and Raghavachari [3] 
for undirected graphs that finds a tree whose degree is 0(Z\* -flog n). Their algo- 
rithm starts with an arbitrary spanning tree of G, and iteratively decreases the 
degree of high-degree vertices by applying “improvement” steps. An improve- 
ment step involves replacing an edge incident to a high-degree node by another 
edge that keeps the tree connected. They applied improvement steps to high- 
degree nodes repeatedly, until no improvement was possible at these nodes. In 
order to extend this algorithm to directed graphs, we make several modifications. 
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Fig. 1. Directed versus undirected graphs 



First, in directed graphs, we face the following problem. Suppose an edge is 
removed that splits the current tree into two trees, namely P and Q (see Fig. 1). 
Suppose we try to combine the two trees by adding the edge (x,y), where x G P 
and y G Q.ln the case of undirected graphs, any such edge would do and we get 
back a spanning tree of G. But in the case of directed graphs, the vertex x must 
be the root of the tree Q. Otherwise the procedure would not yield a branching 
of G. To illustrate, as shown in Fig. 1 (a), undirected trees P and Q can be 
merged into a single tree by adding either the edge {x, y) or the edge (a, b). But 
in the case of directed graphs, as shown in Fig. 1 (b), only the edge (x,y) yields 
a branching. If (a, b) is added, then a has two outgoing edges, and the resulting 
graph is not a branching. Therefore the improvement step needs to be modified. 

Second, the analysis of the algorithm for undirected graphs uses the notion 
of “witness sets” . A witness set is a small set of nodes whose removal splits the 
graph into a large number of connected components. The ratio of the number of 
components to the number of nodes in the witness set is a lower bound on A*. 
It was shown by Fiirer and Raghavachari [3] that there are witness sets that can 
be used to find a tree whose degree is at most A* + 1. We will show that the 
notion of a witness set must also be modified for the case of directed graphs. 



3.1 Witness Sets 

The minimum ratio of the cardinality of a vertex set W to the number of com- 
ponents that are generated when W is removed from the graph is called the 
toughness of the graph. Win [13] has shown the following interesting relation- 
ship between the toughness of a graph and the MDST problem. He showed that 
if the toughness of a graph is at least then it has a spanning tree whose 
degree is at most k (for k > 2). Vertex sets for which the above ratio is close to 
the toughness of the graph are called witness sets. Fiirer and Raghavachari [3] 
used such witness sets to establish a lower bound on the degree of an optimal 
tree for the MDST problem. They showed that if there is a witness set of size w 
whose removal splits G into t components then A* > 
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This definition is not suitable for directed graphs. We prove a new lemma 
that can be used to establish a lower bound on A* for directed graphs. Suppose 
we have a set of witness vertices W and a set of blocking vertices B satisfying the 
property that paths from different vertices of W do not intersect before being 
incident to a vertex in B (see Fig. 2). From this we show that there are \W\ 
paths that have distinct edges into B, thus establishing a lower bound on the 
degree of vertices in B in any branching. 




Fig. 2. Paths from W are internally disjoint 



Lemma 1. Let G = (V, E) he a directed graph and r G V. Suppose there are 
subsets of vertices W GV and B CV that satisfy the following properties: 

1. Any path from a vertex v GW to r must have an incoming edge into a vertex 
in B, 

2. For any two vertices v,w G W, any path from v to r can intersect a path 
from w to r only after it passes through a vertex in B. In other words, G 
has no branching wherein the path from v to the least common ancestor of v 
and w does not contain a vertex of B. 

Then the degree of an DMDST rooted at r of G satisfies, A* > |"|VF|/|i?|] . 

Proof. Let T* be an optimal branching rooted at r for the DMDST problem. 
Since it is a branching, it contains a path from any vertex to the root. By 
Condition 1 of the lemma, a path from a vertex v G W to r contains at least 
one edge into a vertex in B. Let /„ yf u be the closest ancestor of v such that 
fy G B. Let Py be this path from v to fy. By Condition 2 of the lemma, the 
paths {Py : V G V — {r}} are all internally disjoint. Therefore we have identified 
\W\ paths in T* , and each of these paths has an incoming edge to some vertex 
in B. Therefore the average degree of a vertex in B is at least |bF|/|B|, implying 
that there is at least one vertex in T* whose degree is [|fF|/|i3|] or more. 

4 The DMDST Algorithm 

Our algorithm starts with an arbitrary branching T of G and reduces the degree 
of high-degree nodes iteratively by applying improvement steps defined below. 
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Consider a node v whose parent in the tree is p. We can decrease the indegree 
of p by 1 (which is an improvement step applied to p) if we can delete the edge 
(v,p) and find an alternate path for v to reach the root r. This new path from 
V to r initially goes through some nodes of Cy, vertices in the subtree rooted 
at V, reaching a node w € Cy {w may be v itself). A new edge (w,x) is added 
(replacing the edge from w to its parent) where x is unrelated to v. Since x is 
unrelated to v, it is unrelated to any vertex in Cy. Therefore the path from x to 
r in r is unaffected. Since v can reach x after the improvement, v can reach r. 
We perform an improvement step only if after the improvement, vertices whose 
degrees increased have a smaller degree than p. 




Fig. 3. Example of an improvement applied to vertex p 



Fig. 3 illustrates an example of an improvement step. In this example, the 
tree edges are shown in thick lines and other edges of g are shown in dashed 
lines. The indegree of p is 5. If u can find an alternate path to r so that the edge 
(u,p) may be deleted from T, the degree of p can be decreased to 4. The edge 
(c, p) is deleted because the indegree of g is already 4, and if we choose to add 
this edge, its indegree becomes 5. Decreasing the degree of p to 4 by increasing 
the degree of 5 to 5 (old degree of p) does not make progress. The edge (u,p) 
is also deleted and the algorithm tries to find a path from v to r. Such a path 
exists — (w— >-a— >-6 — >■...— >-r) and the algorithm uses this 
path to modify the branching; the new branching is shown in Fig. 3 (b). The 
indegree of p has thus been successfully reduced to 4. 

We will now describe how to test if such an improvement exists. Let the degree 
of p be A. We first ensure that the degree of vertices whose degree is Z\ — 1 or 
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greater does not increase. Delete all nontree edges of G that are incident into 
nodes of degree Z\ — 1 or greater, i.e., Sa-i- In the remaining graph, delete the 
edge (v,p) and test if there is a path from v to r. If such a path exists, we can 
select a shortest such path and use it to make an improvement to p as follows. 
Let X be the vertex closest to v in the path such that x ^ Cy. For each edge 
{y, z) in the path from v to x, we replace the edge {y-,p[y]) by the edge {y, z). It 
can be verified that the above operation results in another branching since the 
number of edges is still n — 1 and all vertices can still reach r. 

Procedure Improvement(T, w,p) 

1. Delete (v,p) from G. 

2. Let A be the degree of p. For each vertex u G V whose indegree in T is 
greater than Z\ — 1, delete from G edges going into u that are not in T. 

3. Run Breadth-first search from v, and test if the root r is reachable from v. 

4. If there is no path from v to r, return False after restoring all edges of G. 

5. Otherwise, BFS finds a path P from v to r. Let w be the first vertex on the 
path with the property that {w, x) G P and w G Gy and x ^ Gy. 

6. For each edge (a, 6) in the subpath of P from v to x, replace the edge from 
{a,p[a\) in T by (a,b). 

7. Restore all edges of G and return True. 

We now consider the DMDST algorithm. The algorithm tries to reduce the 
degree of high-degree vertices by finding suitable improvements. The target ver- 
tices are those whose degrees are within O(logn) from the maximal degree of 
the current branching. When no improvements are possible to these nodes, the 
algorithm terminates. 

Algorithm DMDST(G, r) 

1. Find a branching T of G rooted at r. Let its degree be k. Fix some constant 
c > 1. 

2. For each edge (v,p) G T, run Improvement(T, u,p) if the degree of p in T 
is more than k — [log,, n] . If the degree of T has changed, reset k to be its 
new degree. 

3. Repeat the above step until Improvement(T, r;,p) returns false for every 
edge (v,p) G T for which it is called. 

4. Return T. 

5 Analysis of the Algorithm 

The analysis of the running time of the algorithm uses potential functions 
that were introduced by Fiirer and Raghavachari [3], and adapted by Ravi, 
Raghavachari and Klein [12] and Konemann and Ravi [7]. In fact our analysis 
of the running time is almost the same as in [12]. The potential of a vertex of 
degree A is defined to be and therefore the total potential of all the vertices 
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is at most where k is the current degree of T. An improvement is applied 

to a vertex of degree A > k — [log^ n] . Each improvement step that targets a 
vertex of degree A reduces the total potential by at least since the degree 

of a node of degree A is reduced by 1 and the degree of all the other nodes may 
increase to Z\ — 1. Since A > k — [log^ n] , this reduction in potential is at least a 
fraction n~ *°®<= of the current potential of the branching. It follows that the 
number of improvement steps is at most Each improvement step can 

be implemented in 0{n?) time, thus giving a total running time of 0{n}°^<= ”+®) 
for the algorithm. 

The following lemma relates the running time of the algorithm and the num- 
ber of improvement steps, I. This expression for the running time is a more 
meaningful measure, since our experiments show that I grows only linearly with 
n, making the observed running time O(n^). 

Lemma 2. The running time of Algorithm DMDST is 0{n^I), where I is the 
number of improvement steps. 

The analysis of the degree bound of the tree output by the algorithm is 
more interesting. For this analysis, the notion of witness sets that was used by 
the algorithm for undirected graphs has to be strengthened. In directed graphs, 
there may be edges in the “wrong” direction that don’t help in constructing a 
branching, but they may stop the graph from falling apart when a few critical 
vertices are removed. 

We now show how to find a witness set W and its blocking set B for the 
branching T output by our algorithm. In fact we will identify one pair of sets W 
and B for each A in the range k to k — [log^ n] . 

Lemma 3. Let T he a branching whose degree is A or more. Let Sa be the set of 
vertices whose degree is A or more. There are at least {A — l)|S'zi| + 1 unrelated 
vertices such that the parent of each of these vertices is in Sa ■ 

Proof. The proof is by induction on the cardinality of Sa- If |5'zi| = 1, then the 
single vertex in that set has at least A children, and the children of this vertex 
satisfy the lemma. If |S'/i| > 1, remove a node v G Sa and all its descendents 
from T such that v has no descendents in Sa (except itself). Now the resulting 
branching has |S'/i| — 1 nodes of degree A or more, and by the induction hypoth- 
esis, has at least {A — l)(|S'zi| — 1) -I- 1 unrelated nodes that are children of Sa- 
Since all these nodes are unrelated to each other, at most one of these nodes is an 
ancestor of v. Therefore there are {A — l)(|S'zi| — 1) nodes left that are not ances- 
tors of v. Now we add the children of v to this set, the set increases by at least A, 
and the number of nodes that we get is (Z\ — l)(|S'zi| — 1) -|- A = (A — l)|S'/i| -|- 1. 

Lemma 4. Let T be the branching output by our algorithm. Let its degree he k. 
Then for any k — [log^ n] < A < k, 

(A-1)\Sa\ + 1 
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Proof. Let W be the set of vertices as in Lemma 3 that are children of nodes in 
Sa, but have no descendents in Sa- We know that \W\ > {A — l)|S'zi| + 1. Let 
B be S'/!-!, the set of all vertices whose degree is at least Z\ — 1. For each vertex 
V € W, the algorithm tries to find an improvement that decreases the degree of 
p = p[v\. Since it failed (the condition under which the algorithm stops), any 
path from n to r that doesn’t use {v,p) must go through a vertex x in 5/1- 1 - By 
construction, the internal vertices of the path from u to x is entirely contained 
in Cy, the descendents of v in T. Since all vertices of W are unrelated to each 
other, these subtrees are disjoint. Therefore, the sets W and B that we have 
defined satisfy the conditions given in the statement of Lemma 1. Therefore, 

/i* > \\W\/\B\\ > + \ 



Theorem 1. The degree of the branching returned by our algorithm is at most 
cA* + logyU, where c> 1 is the constant in Step 1 of the DMDST algorithm. 

Proof. Lemma 4 establishes a set of lower bounds on A* for [log/, n] different 
values of A. At least for one of these values of A, |5/i_i| < c|5/i|. Using this 
value of A, we get k < cA* + logcU. 

6 Experimental Results 

The algorithms were implemented in C, using Knuth’s Stanford GraphBase 
toolkit [6] and tested on large numbers of randomly generated graphs. These 
input random graphs actually followed two different patterns, as described be- 
low individually. The running time clearly depends upon the initial tree, and this 
was indeed observed in the experimental study. We tried to generate the initial 
tree using both depth-first search (DFS) and breadth-first search (BFS). BFS 
tends to generate high degree nodes and therefore the number of improvement 
steps tends to be much higher than if the initial tree was generated by DFS. 
In dense graphs, DFS generally finds low-degree trees by itself. While we found 
that our algorithm further reduces the degree of the initial DFS tree significantly, 
we present experimental results with an initial BFS tree, because we wished to 
capture the worst-case performance of the algorithm. 

6.1 Uniformly Distributed Random Graphs 

For this class of randomly generated input graphs, the probability of existence 
of an edge between two vertices is set to be a constant. The number of vertices 
were varied from 100 to 9000. A small number of runs were on 20,000-node 
graphs. We also varied the density of the graph. This class of graphs tends to be 
Hamiltonian and a good algorithm should be able to find a low-degree branching. 
We observed that our algorithm also has no difficulty in achieving this. 

The results for this class of graphs is presented in Fig. 4, which shows the 
number of improvement steps as a function of n. The x-axis shows n and the 
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Fig. 4. Number of improvement steps versus n 



y-axis shows the average number of improvement steps in random graphs with 
uniform edge probability. For each value of n, the algorithm was run on several 
random instances and the average number of improvement steps over these in- 
stances was used to generate this plot. Note that the number of improvement 
steps is almost linear in n. 

In all of our test cases, we found that the algorithm always found a branching 
of degree two or less, irrespective of how large n was or how bad the initial degree 
of the tree was. For problems of small size (n less than 100) the algorithm would 
often return a branching of degree one, i.e., a Hamiltonian path. 

Fig. 5 shows the degree of the intermediate trees as the algorithm progresses 
on a 20,000-node random graph with uniform edge probability. Observe that 
the algorithm makes rapid progress initially since there are very few vertices of 
high degree. Once the degree becomes small, a larger number of improvements 
are needed to decrease the degree of the tree. Note that if one terminated the 
algorithm earlier (say, to meet a fixed deadline) then the current tree can be 
used without a big sacrifice on the quality. The shape of the curve shows that 
we get about 50% of the progress in about 10% of the time. 




Fig. 5. Degree of tree versus number of improvements on a 20,000 node graph 
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6.2 Graphs with a Hidden Hamiltonian Path 

The second class of graphs we tested were deliberately constructed to provide 
“bad” inputs to the algorithm to really test the algorithm’s power and efficacy. 
The graphs in this class were generated by first obtaining a random bipartite 
graph with ni vertices on one side and ri2 vertices on the other side (say ni < 712). 
The ratio of nijn-Y was varied (while holding n = rii + U2 constant). We finally 
added to each of these bipartite graphs a random Hamiltonian path. Without 
the “hidden” path that we added at the end, the degree of any branching in 
the bipartite graph is at least n2/ni, since vertices in the tree must alternate 
between the two sides. We wanted to see if the algorithm finds this hidden path. 

The algorithm performed very well even in these bad input instances, always 
returning no more than a degree 2 tree in the end. As shown in Fig. 6 the number 
of improvement steps varies significantly with the ratio ri2/ni, a measure of the 
badness of the input graph. As expected, the algorithm takes longer as the ratio 
gets closer to 1 . We observed that in all cases, the number of improvements is a 
small multiple of n. 




Fig. 6. The x-axis shows the ratio of niln\ and the y-axis shows the average number of 
improvement steps for bipartite graphs with a total of 1000 vertices. The input graphs 
for this plot were randomly generated bipartite graphs that were augmented with a 
hidden Hamiltonian path. 



7 Conclusions 

We have presented an approximation algorithm for the directed minimum-degree 
spanning tree problem. We introduced a new notion of witness sets that works 
in directed graphs. Though we couldn’t prove a polynomial running time for our 
algorithm, it is likely to be fast in practice as shown by the experimental evi- 
dence. There are several open questions that follow from this work. Is it possible 
to implement our algorithm to run in polynomial time? Currently, the Steiner 
version of the DMDST problem does not have an approximation algorithm even 
with an 0 (log n) approximation ratio. 
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We describe an example that shows why our current approach fails in the 
Steiner version of the DMDST problem. In this example, there is a high-degree 
node p of degree k. Its children are ci, . . . , Cfc. Each of these k children have an 
edge into vertex s, which is not in the current Steiner tree, and s has an edge into 
p. It can be verified that there is no improvement possible for vertex p. But, the 
degree of p can be reduced to [|J + 1 by connecting [|J of p's children through 
s. In fact if the graph has a number of other extra nodes similar to s, the degree 
of p can be reduced even to 2. This example shows that our algorithm does not 
guarantee any performance bound on the degree of the tree for the Steiner case. 
The reason that we were unable to apply Lemma 1 is that, the paths from ci 
through Cfc (the nodes in W) to r intersect each other at s, before reaching p 
(which forms the set B), thus violating Condition 2 of the lemma. 
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Abstract. We present an algorithm to answer a set Q of range count- 
ing queries over a point set P in d dimensions. The algorithm takes 
O (sort(|P| + IQI) -b (l^l+lgb^d^l) log^/^ I/Os and uses lin- 

ear space. For an important special case, the a(|P|) term in the I/O- 
complexity of the algorithm can be eliminated. We apply this algorithm 
to constructing t-spanners for point sets in and polygonal obstacles 
in the plane, and finding the K closest pairs of a point set in R^*. 



1 Introduction 

Motivation. Range searching and range counting problems have applications 
to spatial databases, geographic information systems, statistical analysis, and 
problems in computational geometry [1]. Conical Voronoi diagrams and 0-graphs 
appeared first in the early 80’s as the base structure of the so called region 
approach for solving nearest neighbor problems and constructing a minimum 
spanning tree for a given set S of N points in d-dimensional Euclidean space [23] . 
These structures have numerous further applications and a rich history in various 
fields of computer science, e.g., in motion planning [11], construction of spanner 
graphs [16,18,19], problems on communication networks [15], for approximating 
the Euclidean minimum spanning tree [23] of a given point set, and for real- 
time walkthrough in virtual scenes [12]. In most of these domains, realistic data 
sets are too large to fit into the internal memory of state-of-the-art computers. 
Therefore special techniques are required to reduce the overhead involved in 
accessing secondary storage (i.e., disks). Existing internal memory techniques 
cannot be adapted, as random access into secondary memory is far too expensive. 

Range counting. Range counting is a special kind of range searching problem. 
Given a point set P and a query range q = x [x 2 ,X 2 ] x ••• x [xd,a;[j], 

the standard range searching problem is to report all points in P D q. It is easy 

* Research supported by NSERC, NCE GEOIDE, and DFG-SFB376. 

^ sort (A) denotes the I/O-complexity of sorting N data items in external memory. 
See Model of Computation. 



R. Hariharan, M. Mukund, and V. Vinay (Eds.): FSTTCS 2001, LNCS 2245, pp. 244—255, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 




I/O-Efficient Batched Range Counting 245 



to answer a single query of this type in optimal 0(scan(A^)) I/Os. Typically, 
however, we are either presented with a batch of queries (the hatched scenario), 
or we want to build a data structure that can answer queries in o(scan(A^)) I/Os 
per query, depending on the size of the output (the online scenario). I/O-efficient 
solutions for the online range searching problem have been presented in [3,7,20]. 
When solving the batched range searching problem, our goal is to minimize the 
total number of I/Os spent on answering all queries. An I/O-efficient solution 
to this problem has been presented in [2] . 

While range searching asks to report all points in PC\q, range counting asks to 
report a value ( 2 )pePn 9 where O is a commutative and associative operator 
and A(p) is a label assigned to point p. Important special cases include counting 
the elements in P O g (the labels are 1, and O is standard addition) or finding 
the minimum point in P O g w.r.t. some arbitrary weighting scheme. 

6 -Graphs. Given a set {po, ... ,pk} of points in such that the vectors (pi—po), 
0 < i < d are linearly independent, we define the simplicial cone spanned by 
points Po, . . . ,pk as the set {po + ^i{Pi ~ Po) : Ai > 0}. We call po its apex. 
A 6-frame is a set C of simplicial cones such that each cone has its apex at the 
origin, c = and each cone Ci G C contains a ray li emanating from 

the origin such that for any other ray I in the cone emanating from the origin, 
Alii < 6 j2. Denote li as the cone axis of Ci. In [19], it is shown how to construct 
a 0-frame of size (d/0)^^'^\ 

Given a 0-frame C and a point p, let co(p), . . . , Ck-i{p) and lo{p), . . . , lk-i{p) 
be translated copies of cones cq, . . . , Ck-i G C and rays Iq, . . . , Ik-i such that 
the apexes of cones Ci{p) and the endpoints of rays h{p) coincide with point p. 
For p,q G P and 0 < z < fc, the distance dista{p,q) between p and q w.r.t. 
cone Ci is defined as the Euclidean distance between p and the projection of q 
onto the translated ray h{p) if q is contained in Ci{p), and infinity otherwise. 
The Euclidean distance between two points p and q is denoted by dist 2 {p,q). 
For each point p G P and 0 < z < fc, let Pc,(p) = P C\Ci{p). 

The K-th order 9-graph Gs^k{P) is defined as follows: The points of P are 
the vertices of Gq^k{P)- For every point p G P and every cone Cj G C, we add 
directed edges from p to the K* = min(AT, |Pc.(p) |) vertices in Pc,(p) that are 
closest to p w.r.t. the distance function dist^. 

A t-spanner, t > 1, for a point set P is a straight line graph with vertex set 
P such that for each pair p,q G P oi vertices, the shortest path from p to g in G 
is at most t times longer than the Euclidean distance from p to g; the length of 
a path is the sum of the Euclidean lengths of its edges. We call such a shortest 
path a t-spanner path from ptoq;t is called the stretch factor of G. In [19], it is 
shown that in a fixed dimension d> 2 and for 0 < 0 < the 6*-graph Gg{P) is 

a ^ i_ 2 sin(e/ 2 ) ) -spanner for P, and that it can be constructed in 0(A^log‘^“^ N) 
time using 0(fVlog'*“^ N) space. 

A spanner graph G is K -fault tolerant if after removing at most K vertices 
or edges from G, the remaining graph still contains a path between each pair 
p, q of vertices which is at most t times longer than the shortest path between p 
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and q in the complete Euclidean graph after removing the same set of vertices or 
edges [17]. In [18], it is shown that the K-th order 0-graph is AT-fault tolerant. 

The 0-graph Gg{0) for a set O of simple polygonal obstacles in the plane 
is defined as follows: The vertex set of Gg{0) is the set of obstacle vertices in 
O. Each vertex v is connected to the nearest visible vertex in each cone c{v) 
w.r.t. distc- This graph has been introduced in [11] to solve the approximate 
shortest path problem in two and three dimensions: Given two points s and t 
and a constant e > 0, find an obstacle avoiding path which is at most (1 -I- e) 
times longer than the shortest obstacle avoiding path between s and t. 

Conical Voronoi diagrams. The conical Voronoi diagram CVDc{P) of a set P of 
points is closely related to the 0-graph. For a cone c, the conical Voronoi region 
of a point p £ P is the set of points in the plane having p as the closest point 
in P w.r.t. distc- GVDc{P) is the planar subdivision defined by the Voronoi 
regions Vp, p £ P. Similarly, the conical Voronoi diagram CVDc{0) of a set O 
of simple polygonal obstacles is the planar subdivision defined by the obstacles 
and the Voronoi regions Vp of the obstacle vertices, where Vp contains the points 
a; G K.^ having obstacle vertex p as the closest visible vertex w.r.t. distc- 

Model of computation and related results. Our algorithms are designed and an- 
alyzed in the Parallel Disk Model (PDM) [22]: An external memory consisting 
of D disks is attached to a machine with an internal memory of size M. Each of 
the D disks is divided into blocks of B consecutive data items. Up to D blocks, 
at most one per disk, can be transferred between internal and external mem- 
ory in a single 1/ 0-operation. The complexity of an algorithm is the number of 
I/O operations it performs. Many external memory (EM) algorithms and tech- 
niques for fundamental problems in computational geometry, graph theory, GIS, 
matrix computations, etc. have been developed in this model. Due to the lack 
of space we refer the reader to the survey of [21] and mention only the most 
relevant work here. It has been shown that sorting an array of size N takes 
sort(fV) = 0 I/Os [21,22]; scanning an array of size N takes 

scan(TV) = 0 (y^) I/Os. EM algorithms for computing pairwise intersections 
of orthogonal line segments, answering range queries in the plane, finding all 
nearest neighbors for a set of N points in the plane, dominance problems, and 
other geometric problems in the plane are discussed in [2,3,7,13,20]. General line 
segment intersection problems have been studied in [6]. For lower bounds on 
computational geometry problems in EM see [5] . See [4] for buffer trees, priority 
queues, and their applications. 

Overview. In Sect. 2, we discuss our solution to the batched range counting 
problem. In Sect. 3, we use the solution for a special case of this problem to 
compute K-t\i order 0-graphs for point sets in d dimensions. We also discuss 
how to report a spanner path I/O-efficiently. In Sect. 4, we apply 
order 0-graphs to solve the AT-closest pairs problem in d dimensions. Finally, in 
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Sect. 5, we show how to compute the 0-graph and the conical Voronoi diagram 
of a given set of disjoint simple polygonal obstacles in the plane I/O-efhciently. 

In order to solve the batched range counting problem, one could use the 
range searching algorithm of [2] to report the points in each query range, and 
then count the points reported for each query. Using this strategy, the complexity 
of answering a query depends on the number of points in the query range, which 
can be as large as N. Our solution is independent of the number of points in 
each query range. 

In [14], an asymptotically more efficient construction for spanners of point 
sets in d dimensions is presented. However, as this construction is based on a 
well-separated pair decomposition [10] of the point set, the constants hidden in 
the big-Oh notation are extremely large, the construction works only for point 
sets, and the approach cannot be used to construct RT-fault tolerant spanners. 
For moderate dimensions, we expect our algorithm to compare favorably with 
the one of [14]. Similarly, the solution to the iF-closest pair problem presented 
in [14] is asymptotically more efficent than ours; but for moderate dimensions 
we expect our algorithm to be more efficient. 

Our construction of 0-graphs for sets of polygons is based on the construction 
of [11]. However, the algorithm of [11] is not I/O-efficient and cannot easily 
be made I/O-efficient. We prove a number of interesting properties of conical 
Voronoi diagrams to obtain an I/O-efficient solution to this problem. 

2 Batched Higher-Dimensional Range Counting 

In this section, we present I/O-efficient algorithms for the batched d-dimensional 
range counting problem. First we consider the important special case where 
w.l.o.g. x'l = oo, for all query ranges [a;i,a;i] x ••• x [xd,x'^. Our solution for 
this case is used in Sections 3 and 5. For the general case, we present a more 
complicated algorithm, which is by a factor of 0(a(|P|)) slower. 

2.1 Colorable Problems 

To solve the batched range counting problem, we apply the framework of col- 
orable search problems defined in [2], although we extend it slightly. This frame- 
work can be used to derive an algorithm solving a search problem in d> 1, 
from an algorithm solving the same search problem in 

Let P be a batched search problem answering a set Q of queries over a point 
set P in Given a set C of colors, we define a coloring C assigning a color 
Cq £ C to every query q £ Q and a set Cp C C of colors to every point p £ P. 
Every color c £ C defines a point set Pc = {p £ P ■ c £ Cp}. Let Pc be the 
problem of answering queries q £ Q with respect to point sets Pc,- 

We call P {2p,2r, S)-rrf -colorable in dimension d, for some constant 0 < c < 
1, if for every coloring C with [Cl = 0(^\/rrp) and such that there are 0{nP) 
different color sets assigned to the points in P, there exists an algorithm A 
that solves problem Pc and can be divided into phases A^p'^ , aI^'^ , A^p^ , A^r^ , . . . , 
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A^~^\ so that phases A^\ . . . , A^^'^ take 2p I/Os in total, the total 

1/ 0-complexity of phases Air\...,Ai^~^'’ is X^, A uses S space, and phases 
A^p \ . . . jA^p'^ are independent of d. 

The idea behind this definition is that we can use algorithm A to derive 
an algorithm B solving V in only by replacing phases . . . , 

by phases b'^\ . . . ,Br^ in order to deal with the extra dimension. That is 
algorithm B consists of phases A^^ , B'^'^ , A^p'^ , B?'^ , . . . , A^p~^'^ , , A^'^ . 

We solve the search problem to be addressed by phase bI'^^ using a buffer 
tree [4] of degree '/rnP and algorithm aI^'^ . The buffer tree is built over the 
coordinates of the points in P in the d-th dimension. Queries are filtered from 
the root of the tree to the leaves. At every level, each query q is answered w.r.t. 
the maximal multislab spanned by g. A point is colored with the colors of the 
multislabs that it is contained in. Hence, we have to solve a colored version of 
the d-dimensional problem at every level, which we do using . Adapting the 
proof of [2] to this more general framework, we obtain the following result. 

Theorem 1. If a hatched search problem P is {Xp,Xr, S)-rrf- colorable in 
then its {d+ 1) -dimensional version is {lp,Ir ■ log^^/s colorable. 

Corollary 1. If a hatched search problem V is {Ip, Ir, S)-m‘^~ colorable in K.^, 
then its d-dimensional version can he solved in O (Xp -\-Xr ■ I/Os 

using 0{S) blocks of external memory. 

We now apply this framework to solve the batched range counting problem. 

2.2 Partially Unbounded Queries 

The following algorithm shows that if w.l.o.g. x[ = -l-oo, for all queries q G Q, the 
batched range counting problem is ^sort(|P| -I- |(5|), scan(|P| -|- |Q|), 
colorable for d = 1: Sort all queries by left endpoints and all points by their 
xi -coordinates (Phase A^~^). Scan P and Q in lock-step fashion, simulating a 
line sweep from -|-oo to — oo. During the sweep, maintain a value II = (^{A(p) : 
p G P and xi{p) > x/i)}, where x/i) is the current position of the sweep 
line. When the sweep line passes a point p, update TTnew ^ A(p) ® dfoid- When 
the sweep line passes the left endpoint of a query q, report U as the answer to 
query q (Phase aI^'^). In order to make this solution m-colorable, maintain m 
separate products idct) one per color class Ci in the coloring. In order to report 
the answer to query q, compute II = ®{llci '■ Cq € Ci} when the sweep passes 
the left endpoint of q. Using Cor. 1, we obtain the following result. 

Theorem 2. It takes O ^sort(|P| -I- |(5|) -I- I/Os and linear 

space to answer a set Q of range counting queries over a point set P in M.‘^, 
provided that w.l.o.g. x} = -l-oo, for all queries [xi,x'/\ x • • • x [xd,x'/\ G Q. 
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2.3 Bounded Queries 

We turn to the general case now, where every query q G Q is a, d-dimensional 
box. Again, we present a solution for the case d = 1 and generalize it to higher 
dimensions by applying Cor. 1. We define a sequence of functions as follows: 
0o(x) = |"fl and <j)i{x) = min{j > 0 : < B}, for i > 0, where 

is defined as f^^^{x) = x and = f{f^^~^Hx)), for i > 0. Thus, = 

log 2 a;, (j> 2 {x) = log* X, and so on. It is an exercise to show that 4>a(N){N) = 
0{a{N)), where a{N) is the inverse of Ackermann’s function. We develop a 
series of algorithms proving the following lemma. 

Lemma 1. For every k > 0, the one- dimensional hatched range counting prob- 
lem is {2p,2r, S)-m°- colorable, where2p < tfc(sort(|(5|)+scan(|P|))+t-sort(|P|), 
2r < tfc(scan(|(5|) + scan(|P|)(/()fe(|P|)), for some constant t > 0, and S = 




Proof sketch. Consider the case fc = 1. The first phase A^p '^ preprocesses P and 
Q as follows: Sort the points in P by their xi-coordinates. Let P be a balanced 
binary tree over the points in P. (We do not construct T, but use it only as a 
conceptual tool.) With every node v G T, associate a value Xy separating the 
points in the left subtree from the points in the right subtree. Associate every 
query q G Q with the highest node Vq G T such that q spans Xy^. Split query 
q into two parts qi and qy to the left and right of Xy^. This produces two sets 
Qi and Qy of left and right subqueries. Sort the queries in Qi according to their 
corresponding nodes Vq, sorted top-down in T and left to right at every level; 
queries associated with the same node v are sorted by their left endpoints. The 
queries in Qy are sorted analogously; but sort the queries associated with the 
same node by their right endpoints. This phase takes 0(sort(|(5|) -I- scan(|P|)) 
I/Os and linear space. 

Phase answers left subqueries at each level h of T using a modification of 
the approach of Sect. 2.2. In particular, the running product II is reset whenever 
the sweep passes a value Xy associated with a node v at level h. This takes 
0(scan(|(5?i| -I- |P|)) I/Os, where Qh is the set of queries associated with nodes 
at level h. Right subqueries are answered using a similar procedure. As there are 
log 2 |P| = /’i(|P|) levels in T, and X)l=o \Qh\ = |Q|, it takes 0(scan(|Q|) -|- 
scan(|P|)0i(|P|)) I/Os to answer all left and right subqueries. Once this is done. 
Phase A^^ combines the answers to the left and right subqueries of each query 
in 0(sort(|Q|)) I/Os. Choosing t appropriately, we obtain the claim for A: = 1. 

For fc > 1, we define the tree T as follows: First associate the whole set P 
with the root r of T. Then split P into |P|/^fc_i(|P|) subsets of size 4>k-i{\P\), 
and create one child of r for each subset. Apply this strategy recursively to 
the children of r. Consider a node v with children wq, . . . , Wg. Let xi, . . . ,Xg be 
values such that Xi separates the points associated with Wj_i from the points 
associated with Wi. Again, associate a query q G Q with the highest node Vq G T 
such that q spans some value Xi. Let xi and Xy be the leftmost and rightmost 
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such values spanned by q, respectively. Split q into three subqueries qi, qm, and 
qr to the left of xi, between xi and Xr, and to the right of Xr, respectively. Note 
that qm does not exist if xi = Xr- Now sort left and right subqueries by level 
and within each level as for the case k = 1. This modified Phase A^p '^ still takes 
at most (t/2)(sort(|Q|) +scan(|P|)) I/Os. 

Phase A^'' now answers left and right subqueries as for k = 1. As the height 
of T is now (j)k{\P\), this takes at most t(scan(|Q|) + scan(|P|)(/fc(|P|)) I/Os. 
Phase Ap combines the query results of qi and qr, and stores the result with 
qm, in order to combine it with the answer to query qm later. This takes at 
most (t/2)sort(|(3|) I/Os. The remainder of the algorithm answers the middle 
subqueries of all queries in Q. Note that at every level of T, middle subqueries 
now stretch between values Xi. Call the interval bounded by two consecutive 
such values at the same level a slab. In order to answer middle subqueries at 
a particular level h, we compute a new point set Ph containing one point Pa- 
per slab a. The label associated with is the product of the labels of all 
points in P C\ a. We can now answer middle subqueries with respect to Ph 
without altering the solution. Due to the reduced size of Ph this takes Z/ < 
t{k - l)(sort(|(5/i|) + scan(|P/i|)) and Z/ < t{k - l)(scan(|P| + |(5h|)) I/Os, by 
the induction hypothesis. Summing over all (f>k{\P\) levels, and adding the costs 
for answering left and right subqueries, we obtain the claim for k > 1. □ 

Choosing fc = 1 for d = 1, and k = a(|P|) for d > 1, we obtain an 0(sort(|P| + 
IQD) I/O algorithm for d = 1, and an O IC+M^ 

I/O algorithm for d > 1. Now partition P into contiguous subsets of size a{\P\) 
using |P|/a(|P|) splitters. Partition every query into left, middle, and right sub- 
queries so that the left and right subqueries are maximized without spanning 
any splitter. For every query, answer the left and right subqueries by scanning 
the respective portion of P. This takes O I/Os. Every middle 

subquery stretches between two splitters. Hence, we can represent every slab by 
a single point, thereby reducing the size of P to |P|/a(|P|), and we obtain the 
following theorem, applying Lem. 1 and Cor. 1. 

Theorem 3. It takes O ^sort(|P| -I- |Q|) -I- I/Os 

and linear space to answer a set Q of range counting queries over a set P of 
points in 

The K -range minima problem is to report the K points with minimum weight 
in each query range. Modifying the scan in Sect. 2.2 to maintain the K minima 
seen so far, it is easy to generalize the algorithm of this section to solve the K- 
range minima problem in O ^sort(|P| -I- |(5|) -I- 

I/Os, using O blocks of external memory. 
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3 Spanners for Point Sets 

Computing K-th order 9-graphs. Our algorithm for constructing the K-th order 
0-graph iterates over all cones c £ C and computes the K closest points w.r.t. 
distc in c{p) for every point p £ P. Using an affine transformation, cone c can 
be transformed into the range [0, -l-oo) x [0, -l-oo) x • • • x [0, -l-oo), and reporting 
the K points closest to p in c{p) translates into a iC-range minima query for the 
modified cone c'{p). Using Thm. 2, we obtain the following result. 

Theorem 4. Let P be a set of N points in and 6 < tt be a constant angle. 
Then it takes O ^sort(iV) -|- 5 ) ^/Os and linear space to construct 

the K-th order 9-graph Gg^K{P) of P. 

For fixed 9, the 0-graph has bounded out-degree, but not necessarily bounded 
in-degree. Using a two-phase approach, applying in both phases similar ideas 
to those presented in [8], we can compute in 0(sort(fV)) I/Os a spanner G of 
bounded in- and out-degree from a given 0-graph G' 

Reporting paths in 9-graphs. A spanner path between two points s and t in a 
0-graph can be computed as follows: For every point p £ P, determine the cone 
c(p) containing t. Add the outgoing edge of p in cone c{p) to a graph T. T is a tree 
with root t whose edges are directed towards the root. It follows directly from 
the arguments applied in the analysis of the spanning ratio of 0-graphs that the 
path from s to t in T is a spanner path. Hence, it is sufficient to report the path 
from s to t in T, which takes 0(sort(iV)) I/Os using time-forward processing [9]. 

4 ii'-Closest Pairs 

The following theorem, which we prove in the full paper, can be used to find the 
AT-closest pairs of a point set P in 0(scan(fV-\/iF)) I/Os, once an 0(/A)-th 
order 0-graph has been constructed for P. 

Theorem 5. Let P be a set of points in 0 < 0 < 2 arccos -\/4/5, 1 < AT < 
n— 1, K* = max(AT, AT^/4), and {p, 9 } be any of the K* closest pairs inP. Then 
the K-th order 9-graph contains an edge between p and q. 

5 Spanners Among Polygonal Obstacles in the Plane 

In this section, we prove the following theorem. 

Theorem 6. Given a set O of polygonal obstacles in the plane, with a total of N 
vertices, a linear size t-spanner for O can be computed in O 1^) 

I/Os using linear space. 

^ The spanning ratio of G is t = t' • t" ■ ’ "'here t' is the spanning ratio of 

G' and t” is the spanning ratio of the single-sink spanners used in constructing G. 
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In [11], it is shown that the 0-graph Gg{0) is such a t-spanner. Our algorithm to 
construct Gg{0) follows the framework of [11]. However, we need to change the 
plane-sweep substantially, in order to perform it I/O-efficiently. For this purpose, 
we have to prove some new properties of conical Voronoi diagrams. The I/O- 
complexity of our algorithm becomes 0{sort{N)) if the endpoint dominance 
problem [6] can be solved in 0(sort(fV)) I/Os. 

The algorithm iterates over all cones c G G and computes the conical Voronoi 
diagram CVDc{0) (Fig. la) consisting of Voronoi regions 14, x G P. For each 
such region V/ and each point p G 14, x is the closest visible obstacle vertex in 
c{p). Thus, for all obstacle vertices y G 14, the edge (y,x) is an edge of Gg{0). 
Once CVDc{0) is computed, we add all such edges to the edge set of Gg{0). 
In the full paper we show how to do this I/O-efficiently, so that Thm. 6 follows 
from the following result, which we prove in the rest of this section. 

Theorem 7. A representation of CVDc{0) as vertex and edge sets can he com- 
puted in O 755 ) I/Os using 0{N/B) blocks of external memory. 

Assume that the coordinates have been transformed so that the cone axis of cone 
c points in positive j/-direction. The construction of CVD/O) uses a plane-sweep 
in this direction. Let c' be the reverse cone of c, i.e., the set of points p such that 
the apex of c' is contained in the cone c{p). Then I 4 4 c'{x). Denote the left and 
right boundaries of c' by hi and hr, respectively (Fig. lb). Let Hi and Hr be two 
lines perpendicular to hi and hr, respectively, and directed upward. We denote 
the projections of a point p onto Hi and Hr by H/p) and Hr{p), respectively. 

For the sake of simplicity, assume that the scene contains a dummy obstacle 
bounded from above by a horizontal line /bot with y-coordinate — 00 . Now every 
cone c'{x) becomes a triangle bounded by h/x), hr{x), and a part of Zbot- As 
14 4 c'{x), 14 is bounded from the left and right by h/x) and hr{x) and from 
below by a polygonal chain consisting of parts of obstacle edges and/or parts of 
cone boundaries h/y) and hr{y), for vertices y G P below x. 

During the sweep, we maintain the invariant that the Voronoi regions Vy, 
for all vertices y G P below the sweep line £, have been computed. Let Pe 
be the set of vertices below £ and Oi be the set of obstacles in O below or 
intersecting £. Define the region Ri = UyePf UoGOf As Ri contains the 
dummy obstacle, it extends to infinity in both a;-directions and in negative y- 
direction. The following lemma is proved in the full paper. 

Lemma 2. Region Ri is connected. 

Lem. 2 implies that Ri is bounded from above by a polygonal chain, even though 
it may contain holes. We call the upper boundary of Ri the horizon of sweep 
line £ and denote it by Ui (Fig. la). 

Let x be the current vertex on the sweep line £, whose Voronoi region we 
want to compute. Let pi be the first intersection point of h/x) with Ui if h/x) 
does not intersect the interior of an obstacle before intersecting Ui. Otherwise, 
let Pi = X. We define a point Pr analogously with respect to hr{x). Then the 
edges hj°’^{x) = (x,pi), Zi™’'(x) = (x,pr), and the part Ut{pi,pr) of Ui between 
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Fig. 1. (a) A conical Voronoi diagram with the horizon Ui for sweep line I shown in 
bold. Obstacles are shaded dark grey. Regions Vy for points y below the sweep line are 
shaded light grey. White regions below the horizon are holes in Ri. (b) Illnstrations of 
various definitions. 



Pi and pr bound a polygonal region R^. Edge (hy°^{x)) is not defined if 

Pi = X (pr = x), and R^ — 0 if pi = Pr = x. The following lemma is shown in 
the full paper. 

Lemma 3. The region R^ defined as above is the Voronoi region of x. That is, 
Rx = Vx- 

Consider the region Vx. Lem. 3 implies that when the sweep line £ reaches ver- 
tex X, the whole boundary of 14, except hj°’^{x) and has already been 

computed. To make the description of 14 complete, we need to compute h™{x) 
and hf°'^{x). This can be done by performing two ray-shooting queries onto Ui 
in the directions of hi(x) and hr{x). 

We answer ray-shooting queries in two stages. The first stage determines 
the points on obstacle boundaries hit by hi{x) and hr{x). The second stage 
determines whether hi{x) and hr{x) hit Voronoi edges h™{y) and hj™{y') before 
hitting the respective obstacles. 

The first stage is equivalent to solving the endpoint dominance problem for 
the set of obstacle vertices, which takes O ^755 logM/(DS) 755 ) I/Os [6]. Con- 
sider rays hi{x) and hr{x), for a vertex x on the boundary of an obstacle o € O. 
Let hf^^{x) = (x,y), where y is the first intersection of hi{x) with an obstacle. 
If hi{x) \ {x} intersects the interior of o before intersecting the boundary of 
o, hf^^{x) is not defined. We define hff^^{x) analogously. We say that segment 
e = (x, y) hits segment e' G Ui if e and e' intersect in a point p and segment 
(x,p) does not intersect any other segment e" C Ui. 

Observation 1. The ray hi{x) (hr{x)) hits a segment hf°'^{y) (h™{y)) if and 
only if segment (hfT^{x)) hits segment hf°'^{y) (hf™{y)). 

If hi{x) does not hit an obstacle edge in Ui, it hits a segment hf°’^{y) because 
segments hj°’^{y) and hi{y) are parallel. Analogously, hr{x) hits a segment hj°’^{y) 
if it does not hit an obstacle edge in Ui. We show how to find the segment hf°'^{y) 
hit by hf^^{x), and the segment hf°’^{y) hit by hff^*‘{x), if any. 
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We do this in two separate plane sweeps. The first sweep finds segments 
h™{y) hit by segments for all vertices x £ P. The second sweep finds 

all segments hit by segments hf^^{x). As these two sweeps are similar, we 

only describe the first one. The difficulty is that segments hj™{y) are not known 
before the first sweep. They are only computed by the second sweep. However, 
the goal of the first sweep is not to compute segments but to determine 

whether a given segment hits such a segment hj™{y). The following 

lemma provides the tool to make this decision without knowing segments hj°’^{y) 
explicitly. For a given vertex x on sweep line £, let the hiding set of be the 

set of endpoints z' of segments hf^^{z') with Hi{z') > Hi{x)^ and y{z') < y{x). 
The following lemma is proved in the full paper. 

Lemma 4. Given a vertex x on sweep line £, let hf^‘^{x) = (x,y). Let z be the 
segment endpoint in the hiding set of hf^^{x) such that Hr(z) is maximized. If 
the hiding set ofhf^^{x) is empty, let Hr{z) = — oo. Then there can he no vertex 
u above £ with H^{u) < max{i7r(y), such that hits hj™{x). 

Based on Lem. 4 we shorten segments hf^^{x) to segments defined as 

follows: If point 2 : in Lem. 4 exists and Hr {z) > Hr{y), then hj'^{x) = {x,y'), 
where y' is the point on hf^^{x) with Hr{y') = Hr{z). Otherwise, hj'^{x) = 
hf^^{x). Observe that hY^{x) C hj™{x). 

Corollary 2. For a vertex x £ P, segment hfP^{x) hits segment hf°'^{y) if and 
only if it hits hj^^{y). 

Next we describe how to compute segments hj^‘^{y) from the given set of segments 
hf^^ixf). Given segments hY^{y), we show how to compute the segment hj^‘^{x) 
(and thus segment hj°’^{x)) hit by (if any), for all u £ P. Once we know 

the intersection point of hf^^{u) with this segment hj'^{x), we can shorten hf^*^{u) 
to /i™(m). 

Lem. 4 says that in order to compute ft.™ (a;), we need to find the segment 
endpoint z below £ with Hi (z) > Hi(x) and such that Hr{z) is maximized over 
all such points. This is a partially unbounded range-maxima problem, which can 
be solved in 0(sort(N)) I/Os, by Thm. 2. 

Given segments ft/'®(x), ray-shooting queries for rays ft®’^*(u) can be answered 
in 0(sort(fV)) I/Os using the distribution sweeping paradigm [13]: The slabs used 
in the recursive partition of the plane are parallel to hr- The sweep, however, 
still proceeds in positive y-direction. For every slab cr, maintain the segment /r(cr) 
among all segments ft/'®(x) seen so far which maximizes Hi{x) and completely 
spans a. When the sweep reaches a new point u, determine the slab a containing 
u, decide whether ft®’^*(rt) hits p,{a) and update y{<7') for all slabs a' completely 
spanned by ft/‘®(u). 

Given segments hj™{x), hf°’^{x), and the obstacle edges, it now takes sorting 
and scanning to extract the vertex and edge sets of CVDc{0). 



Remember that Hi is directed to the left. 
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Abstract. We study the model-checking problem for classes of message 
sequence charts (MSCs) defined by two extensions of message sequence 
graphs (MSGs). These classes subsume the class of regular MSC lan- 
guages. We show that the model checking problem for these extended 
message sequence graphs against monadic second-order specifications is 
decidable. Moreover, we present two ways to model-check the extended 
classes — one extends the proof for MSGs while the other extends the 
proof for regular MSC languages. 



1 Introduction 

Message sequence charts (MSCs) is an ITU-standardized notation widely used to 
capture system requirements in the early stages of design of communication pro- 
tocols [ITU97,RGG96]. The clear graphical layout of an MSC describes how the 
components of a distributed system communicate by exchanging messages. In its 
simplest form, an MSC depicts the exchange of messages between processes of a 
distributed system and corresponds to a single partially-ordered execution of the 
system. A standard way to specify a collection of scenarios is by using message 
sequence graphs (MSGs) (also known as high-level MSCs). MSGs allow MSCs to 
be combined using the operations of choice, concatenation and repetition. 

In [AY99], the authors study the (linearization) model-checking problem for 
message sequence graphs. One is given an MSG and a regular set of sequences of 
actions as the specification and asked whether for every MSC represented by the 
MSG, all the linearizations of the MSC are included in the specification. This 
problem is shown to be undecidable, and the authors go on to define a restricted 
class of MSGs for which the problem is decidable. 

The model-checking problem of MSGs against monadic second-order (MSO) 
formulas, which are interpreted directly on MSCs and not on its linearizations, 
was studied in [MadOl]. It was shown that the model-checking problem was 
decidable without any restriction on the MSGs. Independently, Peled has shown 
that model-checking a fragment of TLC (a temporal logic on partial-orders) is 
solvable for unrestricted MSGs [PelOO]. These results suggest that structural 
logics may be more amenable to verification than those based on linearizations. 

In [HMKTOOb], the authors study another class of MSCs called regular MSC- 
languages. It turns out that using a proof in [HMKTOOb] , one can show that the 
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model-checking problem for regular MSC languages against MSO specifications is 
decidable as well. However, the class of MSC-languages defined by MSGs and the 
class of regular MSC-languages are incomparable — there are MSC-languages 
that are definable in one but not in the other [HMKTOOa]. 

Thus, though the model-checking problem for MSO formulas is decidable 
for MSC-languages presented as MSGs as well as for regular MSC languages, 
the proof methodologies seem very different. While the former works by viewing 
concatenated MSCs as meta-strings where each letter represents an MSC, the 
latter works using the set of all linearizations of the MSCs. 

We unify the proof mechanisms in the above works by considering a larger 
class of MSCs through the formalism of compositional Message Sequence Graphs 
(CMSGs) introduced in [GMPOl]. These are similar to MSGs, except that one 
is allowed to have unmatched send-events (send-events whose corresponding 
receive-events do not occur in the current atomic MSC) while forming an MSC, 
which will be matched with other receive-events in other MSCs. In [GMPOl] it 
was shown that this is a strictly more expressive formalism than that of an MSG, 
and that it allows modelling of interesting scenarios such as the behaviour of the 
alternating-bit protocol. 

In terms of linearizations, a regular MSC language is one such that all the 
linearizations of all MSCs in it is a regular set of sequences [HMKTOOb] . We can 
relax this condition, and represent an MSC-language by a subset L of lineariza- 
tions of MSCs in the MSC-language such that L has at least one linearization for 
each MSC in the language. When L is regular, we say that the MSC-language has 
a regular representative linearization. It then turns out that the class of CMSG 
definable languages is precisely the class of MSC-languages that have a regular 
representative linearization. Consequently, this class of languages subsumes the 
class of MSG-definable languages as well as regular MSC-languages. 

We show that the model-checking problem for CMSGs against MSO spec- 
ifications is decidable. More importantly, we identify two ways of solving this 
problem — one extending the method used to model-check MSGs in [MadOl] 
and the other technique implicit in [HMKT99] to model-check regular MSC- 
languages. 

We then go on to define and study a formalism more general and flexible 
than a CMSG, called extended CMSGs (XCMSGs). In this setting, apart from 
unmatched send-events, one is also allowed to have unmatched receive-events 
when traversing a path in the MSG. We show that the notion analogous to 
linearizations in this setting is that of semi-linearizations — an ordering of events 
which respects the local total orders of the MSC but does not necessarily respect 
the causal order. We show then theorems analogous to the above results — 
equivalence between MSC-languages defined by XCMSGs and those that have 
regular representative semi-linearizations and how the two proofs of decidability 
of the model-checking problem extend to this class as well. 

We believe our results will help understand the nuances involved in model- 
checking classes of MSC-languages, which borders on the line of undecidability. 
On the theoretical side, checking MSO properties of infinite graphs and infinite 
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classes of graphs is an interesting area of research [Tho 97 ]. Our proofs using 
linearizations and semi-linearizations seem to be suggest a new technique to 
tackle this problem for certain classes of graphs. 

Due to space constraints, we present only the gist of proofs — more de- 
tails and related results (like extensions to infinite MSCs, etc.) can be found in 
[MMOl]. 

2 Preliminaries 

Let us fix a finite set of processes V for the rest of the paper and let p, q, r range 
over V. Let us also fix a finite set of messages and a finite set of local actions 
/). Let Sp = {{plq,a),{p?q,a),{p,l) \ p,q G V,p ^ q, a G F^J G Fi} denote 
the set of actions that p participates in. The action {p\q, a) should be read as 
“p sends the message a to q” while the action (p?q,a) means “p receives the 
message a from q” . (p, 1 ) represents p doing a local action 1 . Let S = Fp 

denote the set of all actions. 

Definition 1. A message sequence chart (MSC) over V is a tuple m = 
{E,{<p}p^'P,X,ri) where 

— E is a finite set of events. 

— X : E ^ E is the labelling function which identifies for each event an action. 
Let Ep = {e G E \ A(e) G Ep} denote the set of events of E which p 
participates in. Also, let Se = {e G E \ A(e) = (p!g, a) for some p,q G V,a G 
Fm} denote the set of send-events and Re = {e G E \ A(e) = {plq,a) for 
some p,q GV ,a G Fm} denote the set of receive-events of E. 

— p : Se -g Re is the matching function which associates with each send-event, 
its corresponding receive- event. We require p to he a bijection and for every 
e, e' G E, if p{e) = e' , then A(e) = {p\q, a) and A(e') = (<??p, a) for some 
p,q GV,a G Fm. 

— <p is a total order on Ep for each p GV. 

— Let < = (Upgp Ep) U {(e, e') | e, e' G if and p{e) = e'}. Let <= (<)* he the 
transitive closure of <■ Then < denotes the causal ordering of events in the 
MSC and we require it to be a partial order on E. 

— (non-degeneracy) m is non- degenerate ([AEYOOJ) in the sense that two iden- 

tical messages sent by a process are not received in the reverse order: i.e. if 
there are events 61,62,6)^,62 such that A(6i) = A(62) = {p\q,a), p{ei) = e), 
77(62) = e'2 and 61 <p 62, then it must be the case that 6) <q 6). □ 

An event-linearization of an MSC ttt is a linear order on E which extends 
<, the causal order of to. We represent event-linearizations as sequences over 
E: a sequence ci, . . . , e^ represents the event-linearization C given by e* C Cj iff 

i < j- 

A linearization of an MSC to is any possible sequence of actions which the 
MSC describes. Formally, lin{m), the set of linearizations of to, is the set of 
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sequences A(ei), . . . , A(efe) where ei, . . . , is an event-linearization of m. Lin- 
earizations are represented as words in S*. 

Note that while an MSC defines a nonempty set of linearizations, one can 
associate a unique MSC (up to isomorphism) with a given linearization by defin- 
ing an appropriate set of events and matching the sends and receives using the 
non-degeneracy condition for an MSC. For a linearization w G S*, we can define 
the MSC corresponding to w as {E,{<p}p^-p, X,rj), where E is the set of all 
non-epsilon prefixes of w, X{x.r) = r, where x.r €E,x€E*,r€E. x.r <p y.r' 
iff x.r is a prefix of y.r' and r, r' € Ep. rj{x.r) = y.r' iff r = {plq, a), r' = {qlp, a) 
and the number of occurrences of r in a: is equal to the number of occurrences 
of r' in y. 

For any w € E*, if for every p,q € V, a € Em, and every prefix y of ru, the 
number of occurrences of {p\q^ a) in y is at least the number of occurrences of 
(q?p, a) in y, and the number of occurrences of {plq, a) in w is the same as the 
number of occurrences of {qTp, a) in w, then one can associate an MSC m for 
which re is a linearization. We call such words well-formed words. For such a 
w € E*, let msc{w) denote the MSC corresponding to w. 

We can now represent a collection of MSCs through a set of representative 
linearizations. Let L C E* be a set of well-formed words. Then L represents the 
class of MSCs msc{L) = {msc{w) \ w £ L}. Note that we do not require L to 
have all the linearizations of the MSCs it represents (i.e. L need not be equal to 
lin{msc{L))) , as required for regular MSC languages [HMKTOOb]. We say that 
L is a representative linearization of a collection of MSCs if L represents this 
collection (up to isomorphism). 

Another way to define a collection of MSCs is through the notion of a Message 
Sequence Graph (MSG). This is a mechanism whereby one takes a finite set of 
MSCs and defines regular ways of combining them using choice, concatenation 
and repetition. In this paper, we will consider a more general notion of MSCs, 
called compositional MSCs, or CMSGs. 

Let us first define a compositional MSC (CMSC). A CMSC is basically an 
MSC which, in addition to the normal messages in an MSC, can also have un- 
matched send and receive events — these will be matched up later using cor- 
responding unmatched receive and send events respectively, when the CMSC is 
composed with another. 

Definition 2. A compositional message sequence chart (CMSC) over V is a 
tuple m = {E, {<p}pg-p. A, rf) where E, X and <p are as in an MSC and y : Se 
Re is a partial matching function which associates with some send-events, their 
corresponding receive- events. We require p to he injective and for every e, e' G E, 
*/ ^(e) = e', then A(e) = {plq, a) and A(e') = {qlp, a) for some p,q £ V ,a £ Em. 
Using the events on which p is defined, we can define the causal order, as in 
an MSC. We require this to he a partial order. We also require m to he non- 
degenerate (with respect to the messages matched hy p). □ 

We say that e G S'i; is an unmatched send-event if p is not defined on e and 
e G Re is an unmatched receive-event if e ^ p{Se)- For a CMSC m and {plq, a) £ 
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S, let = \{e £ E \ A(e) = (plq,a) and 77(e) is not defined }| denote 

the number of unmatched send-events of the type a). 

Concatenation of two CMSCs is done by (asynchronously) concatenating the 
events of each process, and by matching up some unmatched send-events of the 
first CMSC with corresponding unmatched receive-events of the second. 

Definition 3 . Letrhi = (if^, {<y}pg-p, Aj, 77^), i = 1,2 6e two CMSCs, with EiC\ 
E2 = 0 . Also, let rhi have no unmatched receives. Let m = (if, {<p}pg-p, A, 77) 
where E = E\\J E2; Ve G ifi, A(e) = Ai(e) and Ve G E2,X{e) = A2(e); <p=<p 
U Cip U{(ei, 62) I Cl G ifi n Ep, 62 G i?2 n Ep} and 77 is defined as follows: 

(Cl) IfeG Ssi and Pi{e) is defined, where i G { 1 , 2 }, then 77(e) = Pi{e). 

(C 2 ) Let e G Sei, 771 he undefined on e, and A(e) = (plq,a). Lf there is an event 
e' G Re 2 such thatpif^ is undefined on e' (i.e. e' ^ P2{Se 2))> X{e') = {qlp,a) 
and \{f £ E \ f <p e, A(/) = {p'.q,a)}\ = \{f' £ E \ f <p e',X{f) = 
{q?p,a)}\, then we set 77(e) = e'. Clearly, if such an e' exists, it is unique. 
(C 3 ) 77 is undefined on all other events of Se. 

If 771 is non- degenerate, then it is easy to see that m is a CMSC, and we define 
the concatenation of rhi and m2, denoted by rhi ■ m2, to be m. Otherwise, con- 
catenation is not defined. □ 

When we write a series of concatenations rhi • 7772 • m3 • ... we always mean a 
left-to-right application, i.e. the term ((7711-7712) -7773) . . .. Note that concatenation 
is sensitive to the order in which it is made. 

Let us now fix some notation for describing automata. Let ^ be a finite alpha- 
bet. A deterministic finite automaton (DFA) over A is a tuple A = {S, Sm, S, E) 
where S' is a finite set of states, Sm G S is the initial state, S : S x A —>■ S is 
the transition function and F C S is the set of final states. S can be extended 
to i 5 ' : S X A* — >• S by defining 5 '(s, e) = s and 6 '{s,x.d) = S{S'{s,x),d) where 
X £ A* and d £ A. We say that a word a; G A* is accepted by A if 5 '{sin, x) £ F. 
The language of A, denoted by L{A), is the set of words in A* accepted by A. 

We are now ready to define CMSGs. Let us fix a finite set of CMSCs M — we 
will call these atomic CMSCs. We say that a sequence of CMSCs rhi, 7772, .. . ?h„ 
from M is well-defined iff the concatenation of the CMSCs in this order is defined. 
We say that it is complete if the concatenated CMSC has no unmatched send or 
receive events — i.e. it is an MSC. 

Let 77 be a finite alphabet and h : 77 — >■ M be a bijection. Given a word 
X = do . . .dn G 77 *, if h{do), . . . h{dn) is well-defined, then x defines the CMSC 
cmsc{x) = h{do)-h{di) . . . h{dn). We say that x is well-defined if h{do ), . . . , h{dn) 
is well-defined and say x is complete if cmsc(x) is an MSC. 

Definition 4 . A compositional message sequence graph f CMSC) is a tuple G = 
{LI, M, h, A) where LI is a finite alphabet, M is a finite set of CMSCs, h \ LI ^ M 
is a bijection, and A is a DFA over LI. We require that every x £ L(A) is 
well-defined and complete. The CMSC represents the set of MSCs msc{G) = 
{c7?7sc(a;) I x G L(A)|. □ 
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For example, consider the CMSG in Figure 1. It depicts a producer-consumer 
example where p sends messages continuously to c. At some point, c sends the 
“abort” signal to p requesting it to stop sending messages but this message takes 
an arbitrarily long time to reach p. The figure shows a typical behaviour and a 
CMSG which represents such scenarios. 




Fig. 1. A compositional message sequence graph 



It turns out that given a structure G = (iT, M, h, A), we can decide if it is a 
CMSG. 

One way to simplify CMSGs is to partition Fm into two sets F^ and F^ 
and always use FGevents for unmatched sends and receives, while using F^ for 
matched sends and receives, in all the atomic CMSGs (as done in [GMPOl]). 
Such a condition would ensure that when concatenating CMSGs, the resulting 
CMSG will never be degenerate. Also, one can show that any CMSG can be 
converted to an equivalent one over such restricted atomic CMSGs. 

MSGs are just like CMSGs except that the atomic CMSCs are in fact MSCs. 
The class of CMSGs clearly extend that of MSGs — in fact, they are strictly more 
powerful. It can be shown that the behaviour illustrated in Figure 1 cannot be 
represented by MSGs nor is a regular MSC-language (as defined in [HMKTOOb]). 

We use monadic second order logic (MSO) to describe properties of MSCs. 
This logic is defined as follows: We have at our disposal a countable number of 
first-order variables {a;, y,z,. . .} and second-order variables {A, Y,Z,...} — the 
former will be interpreted over events while the latter over subsets of events of 
an MSG. The atomic formulas are of the kind (x — >■ y) (meaning that x is a 
send-event whose corresponding receive-event is y), (x <p y) (meaning x and y 
are events of p and x is before y in the total order of process p), Qr{x), where 
r £ F (meaning the label of the event x is r), and (x G X). Other formulas 
are formed using the boolean connectives V and -■ and using quantification over 
first-order and second-order variables. 
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This logic is a powerful structural logic and can express many interesting 
properties of MSCs. It can be shown, for example, that the temporal logic TLC 
over MSCs considered in [PelOO] can be embedded into MSO. 

Let Lphe & formula and m = {E, {<p}pg-p. A, rj) be an MSC. An interpretation 
of a set of first-order and second-order variables P is a function / that assigns to 
each first-order variable an event in E, and assigns a set of events of E to each 
second-order variable. Semantics of when m satisfies (p under an interpretation 
I is defined in the obvious way and leads to the definition of when m satisfies a 
sentence ip. 

3 CMSGs and Regular Representative Linearizations 

We now show that the class of MSC-languages represented by CMSGs is the 
same as that definable using a regular set of representative linearizations. 

Lemma 1. Let L he a collection of MSCs over V. There exists a CMSG G = 
(77, M, 7, A) that represents C iff there exists a regular set L C E* of well-formed 
words which is a representative linearization of C. 

Proof: Let G = (77, 717, h, A) be a CMSG such that msc{G) = C. For every m G 
717, let us fix one linearization of to, lin{m). Let L = {lin{h{di)) . . . lin{h{dn)) \ 
di . . .dn G L{A)}. It is easy to see that L is a regular representative lineariza- 
tion of £. The other direction follows from [GMPOl]. Let L C 77* be a repre- 
sentative linearization of £, and L be a regular language accepted by a DFA 
B = {S, Sin, d, F) over 77. For every r G 77^, let 717 have a CMSC with a single 
event r labelled r. One can now use B itself to define a CMSG which has L as a 
representative linearization. □ 

Let us now turn to the model checking problem for CMSGs: Given a CMSG 
G (or a regular representative linearization L), and a MSO sentence p, do all the 
MSCs represented by G satisfy (pi We show that this problem is decidable. We 
exhibit two proofs of this theorem, one which extends the proof of decidability 
of model-checking MSGs [MadOl] and the other which extends the proof of 
decidability of model-checking for regular MSC-languages [HMKTOOb] . 

A CMSC TO = (77, {<p}pe-p, A, rj) is said to be 6-memory bounded, where 6 G 
N, if the number of unmatched sends of any type is at most 6, i.e., 

A sequence of CMSCs mi, m 2 , ■ ■ ■ , mk is 6-memory bounded if for all its prefixes 
mi, m 2 , • . • , TOi, the CMSC toi • m 2 .. .-mi is 6-memory bounded, where i < k. A 
CMSG G = (77, 717, h. A) is 6-memory bounded if for every did 2 ■ ■ - dn G L{A), 
the sequence of CMSCs h{di), h{d 2 ), ■ ■ ■ , h{dn) is 6-memory bounded. The fact 
that a CMSG defines a collection of MSCs forces it to be 6-memory bounded: 

Lemma 2. For every CMSG G, there exists 6 G N, which is computable, such 
that G is b-memory bounded. 

Proof: Let G = {II,M,h,A) be a CMSG and let A = {S, Sin,S, F). For any 

state s G S which is reachable from the initial state and from which the final 
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state can be reached, one can show that all CMSCs defined on a path from 
the initial state to s must have the same number of unmatched receives (from 
the fact that all accepting paths define MSCs). It then follows that if b is the 
maximum number of unmatched receives at any such state, then G is 6-memory 
bounded. □ 

The following lemma will be used to show that the model checking problem 
for CMSGs is decidable. 

Lemma 3. Let II he a finite alphabet, M he a set of CMSCs and h : II ^ M 
be a bisection. Let ip be an MSO sentence and 6 G N. Then the collection of 
words w = di ... dn € II* such that h{d\), . . . h{dn) is well-defined, complete and 
b-memory bounded, and cmsc{w) \= ip, is a regular subset of II* . 

Proof: The proof follows the corresponding proof for MSCs in [MadOl]. Let p 

be an MSO formula with free second-order variables = {Xx, . . . Xk\ (assume 
for now that we have only second-order variables). Then let LI,p = {{d, I) \ d G 
n,I -.Vcp ^ 2^, where E is the set of events of h{d)}. Each letter in II,p encodes 
along with a letter d G II, an interpretation of the free variables over the events of 
the CMSC corresponding to d. The idea is to construct an automaton A,p which 
will accept a word (di, /i) . . . (d„, /„) iff the sequence h{d\) , . . . h{dn) is well- 
defined, complete and 6-memory bounded, and the MSC m = cmsc(di . . . dn) 
under the combined interpretation defined by Ii , . . . /„ on the events of m, sat- 
isfies p. This is done inductively on the structure of the formula. 

First, the set of all words d\. . .dn G LI* such that h{d\), . . . h{dn) is a well- 
defined and complete 6- memory bounded sequence, is regular. We run an au- 
tomaton accepting this in parallel with the automaton we construct. 

The atomic formula x ^ y is the hardest to handle. The automaton will 
check the interpretation indeed maps it to singleton sets — assume it does. The 
automaton now has to check if y is the corresponding receive-event of x. Since 
the sequence we are reading is 6-memory bounded, we will know the number of 
sends of the kind {p\q, a) which are pending when x happens — let this be fc (fc 
is bounded by the sum of 6 and the maximum number of events in an atomic 
CMSC). The event y is the corresponding receive-event of x, if the number of 
receive-events of the kind {qip, a) between x and y is k. For each atomic CMSC, 
we know a priori the number of receive-events of this kind it has. Since the 
number of atomic CMSCs between the atomic CMSCs which x and y belong to 
that have at least one receive-event of this kind is bounded, we can construct 
with some bookkeeping an automaton which checks this property. 

Other atomic formulas are easy to handle; formulas obtained by disjunction, 
negation and existential quantification can be handled by using the fact that 
automata are closed under union, complement and projection respectively. □ 

Theorem 1. The model checking problem for CMSCs against MSO-formulas is 
decidable. 

Proof: Let G = (77, M, h, A) be a CMSC and p be an MSO-formula. From 

Lemma 2, it follows that there exists 6 G N (which can be computed) such that 
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G is ^-memory bounded. Construct using Lemma 3, an automaton over 77* 
that accepts a word w = di . . .dn iff h{d\), . . . h{d„) is well-defined, complete 
and ^-memory bounded, and cmsc{w) ^ (p. Clearly, all MSCs in msc(G) satisfy 
ip iff L{A) C which can be checked. □ 

Let us now turn to linearizations and use them to solve the model checking 
problem. Let w & E* he & well-formed word and 5 G N. w is said to be b-bounded 
if for every prefix x of w, the difference between the number of send-events in x 
of the type {p\q, a) and the number of receive-events of the same type is at most 
b. A language is said to be 6-bounded if each word in it is 6-bounded. 

Lemma 4. Given an MSO-formula p and 6 G N, the set of all well-formed 
words w € E* such that w is b-bounded and msc{w) \= p is regular. 

Proof: From p we construct, using the technique in [HMKT99, Lemma 4.1], an 
MSO formula p\^^ over finite words in E* such that for ru G A*, ic ^ iff w 
well-formed, 6-bounded and msc(w) |= p. The required conclusion follows since 
the strings described by MSO formulae on words form a regular set [Tho90j. 
The formula i® constructed so that it matches the send and receive events 
of the MSC corresponding to the word, and dynamically interprets p using this 
matching and the local total orders. □ 

We can now give another proof of the result that model checking CMSGs 
against MSO specifications is decidable using linearizations. 

Alternative proof of Theorem 1: In view of Lemma 1, let the CMSG be 
presented as a regular representative linearization L via a DFA B accepting L. 
Since any regular language of well- formed words is 6-bounded (see [HMKTOOb]), 
where b can be computed, we can find a b such that L is 6-bounded. Gonstruct 
using Lemma 4, an automaton B' that accepts all ^-bounded words which rep- 
resent MSGs that satisfy p. Glearly, all the MSGs represented by L satisfy p iff 
L{B) C L{B'), which is decidable. □ 

Another important (albeit simple) consequence of dealing with representative 
linearizations is that it allows us to show decidability of the linearization model 
checking problem [AY99] for trace-closed or linearization-closed specifications. 
The linearization model checking problem for GMSGs is: Given a GMSG G 
and a regular language L over E, is lin(msc{G)) C LI This problem is known 
to be undecidable even for MSGs [AY99]. However, if L is linearization-closed, 
we can effectively solve the problem: Using Lemma 1, construct L' which is a 
regular representative linearization of msc{G). It is now easy to verify that all 
linearizations of MSGs represented by G belong to L iff L' C L. 

Also, the above procedure works in time polynomial in the sizes of the given 
GMSG and the specification automaton. The ease of the above model-checking 
problem raises the hope for defining temporal logics whose models are trace- 
closed (akin to LTrL in the setup of Mazurkiewicz traces [TW97]), as these 
logics may yield a fast verification procedure. The logic m-LTL introduced in 
[MROO] and the logic TLG for MSGs [PelOO] can be seen as starting points for 
this endeavour. 
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4 Extended CMSGs 

In the definition of concatenation of two CMSCs mi and m2 (Definition 3 ), 
we required that rhi have no unmatched receives. We can drop this requirement 
and define concatenation so that it matches unmatched receive-events of rhi with 
corresponding unmatched send-events in m2 (if any). This relaxed definition is 
the same as that of Definition 3 except that condition (C 2 ) is replaced by: 

(C 2 ') Let e G Se, ?7i and 772 be undefined on e, and A(e) = {p\q,a). If there is 
an event e' G Re such that iji and rj2 are undefined on e', A(e') = (<7?p, a) and 
\{f E \ f <p e,A(/) = {p'.q,a)}\ = \{f G E \ f <p e',A(f) = {q 7 p,a)}\, then 
we set 77(e) = e'. 

Again, this concatenation is defined only if the resulting structure is a CMSC 
(i.e. the resulting causal order is a partial-order and is non-degenerate). 

We can now define extended CMSGs (XCMSG) which are exactly like CMSGs 
(see Definition 4 ) except that the MSCs represented by them are defined using 
the relaxed notion of concatenation above. As in a CMSG, we require that the 
accepting paths of the XCMSG define MSCs. 

Again, given a structure G = ( 77 , A 7 , h, A), one can decide if it is a XCMSG. 
Also, the class of MSCs definable in this fashion is strictly larger than the class 
defined by CMSGs. For example. Figure 2 illustrates an XCMSG and a typical 
MSG represented by it. The figure describes a scenario where three processes 
p, q and r exchange messages. The channel from p to r is a slow data channel 
while the channels from p to q and <7 to r are fast. Consequently, the messages b 
followed by c overtake an arbitrary number of messages a from p to r. It is not 
difficult to see that there is no equivalent CMSG representation for it. 




Fig. 2. An extended CMSG 
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However, XCMSGs are amenable to model checking against MSO specifica- 
tions. Analogous to the definition of 6-memory boundedness for CMSCs, one can 
define the notion of (6', b)-memory bounded CMSC, where 6, 6' G N. A CMSC m 
is (6', 6)-memory bounded if the number of unmatched sends of any type is at 
most 6 (i.e., it is 6-memory bounded) and the number of unmatched receives of 
any type is bounded by 6'. We can extend this notion to sequences of CMSCs 
and to XCMSGs. It then turns out that any XCMSG is (6', 6)-memory bounded 
for some 6, 6' G N and we have a result analogous to Lemma 3: 

Lemma 5. Let II be a finite alphabet, M be a set of CMSCs and h : II ^ M 
be a bijection. Let ip be an MSO formula and 6, 6' G N. Then the collection of 
words w = di ... dn € II* such that h{d\), . . . h{dn) is well-defined, complete and 
{U ,b) -memory bounded, and cmsc{w) \= p, is a regular subset of II* . □ 

We then have 

Theorem 2. The model checking problem for XCMSCs against MSO specifica- 
tions is decidable. □ 

The example in Figure 2 also illustrates the fact that there is no regular rep- 
resentative linearization of the MSCs represented by the given XCMSG. It turns 
out that the notion analogous to linearizations in this setting is that of a semi- 
linearization. For an MSC m = {E, {<p}{pg-p}, A, rf), an event semi-linearization 
of m is a linear order on E which extends the orders <p for each p G V. Note 
that the order need not extend the causal ordering < — a receive-event could 
be ordered before its corresponding send-event. The semi-linearizations of an 
MSC TO is the set of sequences A(ei), . . . , A(efc) where ei,...,ek is an event 
semi-linearization of to. 

Again, while an MSC defines a non-empty set of semi-linearizations, a semi- 
linearization w corresponds to a unique MSC, denoted by msc(w). For a word 
w G E*, we say w is semi well-formed if there is an MSC for which ru is a 
semi-linearization We can now represent a collection of MSCs through a set of 
representative semi- linearizations. The following is analogous to Lemma 1. 

Lemma 6. Let C be a collection of MSCs overV. Then, there exists an XCMSC 
G = {IT, M, h, A) which represents C iff there exists a regular set L Q E* of semi 
well-formed words which is a representative semi-linearization of L. 

Proof: Given an XCMSG G, we can fix some linearization of each atomic 

CMSC and realize a set of representative semi-linearizations as a homomorphic 
image of the language accepted by A, where each letter is mapped to the fixed 
linearization of the CMSC corresponding to it. Since we finish all the events of 
a CMSC before moving to another CMSC, it is clear that the resulting words 
respect the local partial orders of the individual CMSCs. Hence this set is semi 
well- formed. The proof of the converse is similar to that of Lemma 1. □ 

Also, a result analogous to Lemma 4 goes through smoothly. A semi well- 
formed word w G E* is said to be (6', b)-bounded if for every prefix x of w, the 
number of unmatched send-events in x of any type is at most 6 and the number 
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of unmatched receive-events in x of any type is at most b' . A language is said to 
be (6', 6)-bounded if each word in it is (6', 6)-bounded. 

Lemma 7. Given an MSO sentence (p and b, b' G N, the set of all semi well- 
formed words w € S* such that w is (6', b) -bounded and msc{w) \= ip is regular. 

We can also show that any regular language of semi well-formed words is 
(6', 6)-bounded for some b, b' G N. Then, we can use Lemma 7 to give an alter- 
native proof of Theorem 2. 
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Abstract. We consider the problem of scheduling n jobs on a single 
machine. Each job has a release date, when it becomes available for 
processing, and, after completing its processing, requires an additional 
delivery time. Feasible schedules are further restricted by job precedence 
constraints, and the objective is to minimize the time by which all jobs 
are delivered. In the notation of Graham et al. [2], this problem is noted 
l|rj,prec|Lmax. We develop a polynomial time approximation scheme 
whose running time depends only linearly on the input size. This linear 
complexity bound gives a substantial improvement of the best previously 
known polynomial bound [4]. 



1 Introduction 

We shall study the following problem. There is a set of n jobs Jj {j = 1, ...,n). 
Each job Jj must be processed without interruption for pj > 0 time units on 
a single machine, which can process at most one job at a time. Each job has 
a release date rj > 0, when it first becomes available for processing and, after 
completing its processing on the machine, requires an additional delivery time 
Qj > 0; if Sj (> rj) denotes the time Jj starts processing, it has been delivered 
at time Lj = Sj + Pj + qj. Delivery is a non-bottleneck activity, in that all jobs 
may be simultaneously delivered. Feasible schedules are further restricted by 
job precedence constraints given by the partial order where Jj -< Jk means 
that job Jk must be processed after job Jj. Our objective is to minimize, over 
all possible schedules, the maximum delivery time, i.e., Tmax = max^ The 
problem as stated is strongly NP-hard even if there are no precedence constraints 
[8]. In the notation of Graham et al. [2], this problem is noted l|rj,prec|Lmax) 
while the problem without precedence constraints is denoted l|rj|Lmax- 

Since these scheduling problems are known to be hard to solve optimally, 
the research focuses on giving polynomial-time approximation algorithms that 
produce a solution close to the optimal one. Ideally, one hopes to obtain a family 
of polynomial algorithms such that for any given £ > 0 the corresponding algo- 
rithm is guaranteed to produce a solution with a value within a factor of (1 -I-e) 
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of the optimum value; such a family is called a polynomial time approximation 
scheme (PTAS). 

Hall & Shmoys [3,5] propose two PTASs for problem IjrjjLniax, the running 
time of which are 0(( and 0{nlogn + ^). They also give 

a PTAS for problem l|rj,prec|Lmax [4], which consists of executing, for log 2 A 
times, an extended version of their previous PTAS for l|rj, jCmax, where A 
denotes an upper bound on the optimal value of any given instance whose data 
are assumed to be integral. This polynomial running time should be contrasted 
with the time complexity of their result for problem Ijr^lLmax) where they were 
able to achieve a considerably better time. To some extent, this is not surpris- 
ing, since precedence constraints add a substantial degree of difficulty, and one 
important area of research in scheduling theory has been to study under what 
conditions a precedence-constrained problem is computationally harder than its 
counterpart with independent jobs. 

In this paper we present a new PTAS for Ijr^ ,prec|Lniax that runs in 0{n + 
£-1- time, where £ denotes the number of precedences. This linear com- 

plexity bound is a substantial improvement compared to the above mentioned 
results. Note that the time complexity of this PTAS is best possible with respect 
to the input size. Moreover, the existence of a PTAS whose running time is also 
polynomial in 1/e for a strongly NP-hard problem would imply P=NP [1]. We 
also improve the previous results [3,5] for problem IJr^jAniax with respect to 
both, n and e. Finally, we remark that the multiplicative constant hidden in the 
0{n + £) running time of our algorithm is reasonably small and does not depend 
on the accuracy e. 

To achieve these improvements a better and deeper analysis of the prece- 
dence structure is needed. We show that the precedence graph can be simplified 
into a more primitive graph. This simplification depends on the desired preci- 
sion £ of approximation; the closer e is to zero, the closer the modified graph 
should resemble the original one. With this simplified precedence structure it 
is possible to partition jobs into a constant number of subsets of jobs having 
similar precedence relations. Then, jobs belonging to the same subset can be 
grouped together into single jobs. The resulting instance is modified to obtain 
a constant number of instances with different release dates and delivery times; 
the final step consists of executing the extended Jackson’s rule on this constant 
number of instances; the best schedule generated is output. 

2 Structuring the Input 

As first step to the construction of a PTAS for problem Ijprec, rj-jAmax, we 
discuss several techniques which add structure to the input data. Here the main 
idea is to turn a difficult instance into a more primitive instance that is easier to 
tackle. This simplification depends on the desired precision of approximation. 

For simplicity, we shall assume that 1/e is integral and 0 < £ < 1. We 
start providing some lower and upper bounds on the optimal value opt. We 
define P = J2'j=iPji Pmax = max^p^, rmax = max^r^, g^ax = max^ and 
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d = maxj=i „ {vj + pj + qj}. Let LB = max{P, d}, we claim that LB < opt < 
3LB. Indeed, since P and d are lower bounds on opt, LB is also a lower bound 
on opt. We show that 3LB is an upper bound on opt by exhibiting a schedule 
with value at most 2>LB. Starting from time Tmax all jobs have been released and 
they can be scheduled one after the other in any fixed ordering of the jobs that 
is consistent with the precedence relation; this can be obtained by topologically 
sorting the precedence graph. Then every job is completed by time rniaxT^* and 
the maximum delivery time is bounded by rmax + ^* + 9max < P + 2d< tiLB. By 
dividing every release, delivery and processing time by LB, we may (and will) 
assume, without loss of generality, that LB = 1 and 

l<opt< 3. (1) 

Following Lageweg, Lenstra & Rinnoy Kan [7], if Jj < Jk and Vj > Vk, then 
we can reset rk ■= Vj and each feasible schedule will remain feasible. Similarly, 
if Qj < qk then we can reset qj := qu without changing the objective function 
value of any feasible schedule. Thus, by repeatedly applying these updates we 
can always obtain an equivalent instance that satisfies 



Jj < Jk (rj < Tk and q^ > qk) (2) 

Such a resetting requires 0{£) time. In the following we will assume, w.l.o.g., 
that condition (2) holds. 

A technique used by Hall and Shmoys [3] allows us to deal with only a 
constant number of release dates and delivery times. The idea is to round each 
release and delivery time down to the nearest multiple of ie, for z G N. Since 
J'max < d < 1, the number of different release dates and delivery times is now 
bounded by 1/e+l < 2/e (we are assuming 0 < e < 1). Clearly, the optimal value 
of this transformed instance cannot be greater than opt. Every feasible solution 
for the modified instance can be transformed into a feasible solution for the 
original instance just by adding e to each job’s starting time, and reintroducing 
the original delivery times. It is easy to see that the solution value may increase 
by at most 2e. Thus in the remainder of this paper, without loss of generality, we 
shall restrict our attention to the case where there are only a constant number 
of release dates and delivery times. 

Therefore, we will assume henceforth that the input instance has a constant 
number of release dates and delivery times, and that condition (2) holds. We 
shall refer to this instance as I. By the previous arguments, opt > opt{I), where 
opt{I) denotes the optimal value for instance I. 



2.1 Partitioning Jobs 

Partition the set of jobs in two subsets L = {Jj : pj > e} and S = {Jj : pj < e}. 
Let us say that L is the set of large jobs, while S the set of small jobs. Observe 
that the number of large jobs is bounded by P/e < 1/e. We further partition the 
set S of small jobs as follows. For each small job Jj G S consider the following 




Grouping Techniques for One Machine Scheduling 271 



three subsets of L: 

-Pre(j) = {(/j ^ L \ Ji ^ Jj ^ , 

Suc{j) = { Ji G L : Jj Ji] , 

Free{j) = L — (Pre{j) U Suc{j)). 

Let us say that T{j) = {Pre{j), Suc{j), Free{j)} represents a 3-partition of set 
L with respect to job Jj. The number t of distinct 3-partitions of L is clearly 
bounded by the number of jobs and by < 3^/®, therefore t < min {n, 31 /®}. 
Let {Ti,...,Tt-} denote the set of all distinct 3-partitions. Now, we define the 
execution profile of a small job Jj to be a 3-tuple < ii,i 2 , *3 > such that rj = s-ii, 
Qj = e ■ Z 2 and T{j) = Ti^, where zi, Z 2 = 0, 1, ...,2/e and is = 1, ..., r. 

Corollary 1. The number tt of distinct execution profiles is bounded by tt < 

T-4/e2 = 20(1/0. 

Partition the set S of small jobs into tt subsets. Si, S 2 , ■■■, such that jobs 
belonging to the same subset have the same execution profile. Clearly, S = 
Si U S 2 , ... U 5'tt and ShC\ Si = 0, for i ^ h. 

2.2 Adding New Precedences 

Let us say that job Jh is a neighbor of set Si (z = 1, ..., tt) if: 

— is a small job; 

Jh ^ Si , 

— there exists a precedence relation between job Jh and some job in Si. 

Moreover, we say that Jh is a front-neighbor {back-neighbor) of Si if Jh is a 
neighbor of Si and there is a job Jj G Si such that Jj < Jh {Jh F Jj). 

Let Ui = |S'i| (z = 1,...,7 t), and let {Jij, ..., Jm,i) denote any fixed ordering 
of the jobs from Si that is consistent with the precedence relation. In the rest 
of this paper we consider the restricted problem in which the jobs from Si are 
processed according to this fixed ordering. Furthermore, every back-neighbor 
(front-neighbor) J/, of Si (z = 1, ...,tt) must be processed before (after) every job 
from Si. This can be accomplished by adding a directed arc from Jjj to 
for j = l,...,zzj — 1, and by adding a directed arc from Jh to Jij, if Jh is a 
back-neighbor of Si, or else an arc from to Jh, if Jh is a front-neighbor. 
Note that the number of added arcs can be bounded hy n-\- £. 

We observe that condition (2) is valid also after these changes. Indeed, if Jh 
is a back-neighbor of Si then there is a job Jj G Si such that Jh -< Jj, and 
therefore by condition (2) we have rh < rj and qh > Qj. But, the jobs from Si 
have the same release dates and delivery times, therefore rh < rj and qh > qj for 
each Jj G S'i. It follows that if we restrict Jh to be processed before the jobs from 
Si, condition (2) is still valid. Similar arguments hold if Jh is a front-neighbor. 
Moreover, all the jobs from Si have the same release dates and delivery times, 
therefore condition (2) is still satisfied, if we restrict these jobs to be processed 
in any fixed ordering that is consistent with the precedence relation. 
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2.3 Grouping Small Jobs 

Consider the jobs ( Ji,i, Jni,i) from subset Si, for z = 1 , tt, sorted according 
to the precedence relations. Let Jj^i and Jj+i,i be two consecutive jobs from Si 
such that pj^i + Pj+i,i < s, for j = 1 , m — 1. We “group” together these two 
jobs to form a grouped job having the same release and delivery time as Jjy (and 
Jj+iy), but processing time equal to the sum of the processing times of Jjy and 
Jj+iy, i.e., pj^i +pj+iy. (This is equivalent to say that we shall consider only 
those schedules where jobs and Jj+i,i are processed together, i.e., Jj+i^i after 
Jjy.) Furthermore, the new grouped job must be processed after the predecessors 
of Jjy and before the successors of Jj+i,i- (Observe that, (J^+iy) excepted, 
the set of predecessors (successors) of Jj+iy (Jj,i) is the same as of Jj^i (J^+iy).) 
We repeat this process, by using the modified set of jobs, until there are no more 
couples of consecutive jobs with sum of processing times less or equal to er. The 
same procedure is performed for all other subsets Si. 

Let Ii denote this new modified instance. Let us denote the corresponding 
processing times, release dates and delivery times of jobs in Ii by pj, rj and gj, 
respectively. Observe that in instance Ii the set of large jobs is the same as in I, 
and all the new grouped jobs are still “small”, i.e., their processing times are not 
greater than e. For simplicity of notation, let us use again L and S to denote, 
respectively, the set of large and small jobs with respect to instance Ii. 

Lemma 1. The number v of jobs in instance I\ is bounded by v < - +- + 2 tt = 

20(lle) ^ 

Proof. Let us use S} to denote set Si after the described grouping procedure, for 
i = 1,...,7T. Let nj = |S'j^|. Observe that the sum Pf = processing 

times of the jobs from 5”^^ is equal to Pi = eS. Pi- 

First observe that every large job has processing time larger than £ and 
therefore there are at most P/e < 1/e large jobs in Ii. By the way small jobs 
are grouped together, at the end of the described grouping procedure, the sum 
of processing times of any couple of consecutive jobs from S} cannot be smaller 

than e. Therefore '^j.^s^Pj — 

ELi < < 2(7t + E Pl/e) < 2(7t + 1/e). □ 

Note that instance Ii can be computed in 0{n + t' + 2^'PP'>) time. Indeed, 
the time required to partition the set of jobs into tt subsets can be bounded by 
0{n + £ + tt); 0{n + t) is the time to add new precedences and group jobs. 

We have already observed that the new precedence constraints added in 
Section 2.2 do not invalidate condition (2). Furthermore, let G denote a set of 
jobs that have been grouped together as described in this section. Then, all jobs 
from G have the same release date re and delivery time qc, and set G has been 
replaced with a single job Jg having re and qc as release date and delivery time, 
respectively. If there exist two jobs Jj and Jk such that Jj ^ G, Jk & G and 
Jj Jk {Jk -< Jj), by condition (2), we know that rj < rk and qj > qk {rk < rj 
and qk > qj), but, by construction, we have also rj < re, qj > qc and Jj < Jq 



^ • e. Since (^ — 1) • e < 



£, we have 
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{re < rj, qa > Qj and Jq -< Jj)- Therefore, grouping jobs does not invalidate 
condition (2). 

Lemma 2. For each couple of jobs, Jj and Jk, of instance I\ the following 
condition holds: Jj -< Jk (rj < r\ and qj > ql) . 

3 The Main Algorithm 

We make use of the following algorithm that is often called the extended Jack- 
son’s rule (since this generalizes a procedure to solve l||Lmax due to Jackson 
[6]): schedule the jobs starting at the smallest r^-value; at each decision point 
t given by a release date or a finishing time of some job, schedule a job j with 
the following properties: rj < t, all its predecessors are scheduled, and it has the 
largest delivery time. 

Our PTAS consists of executing the extended Jackson’s rule on a constant 
number of instances obtained from Ii by changing the release dates and delivery 
times of jobs; the best schedule generated is output. Without loss of generality, 
let us renumber jobs such that pj > pj > ... > _pj > ... > _pj, where A denotes the 
number of large jobs. For j = 1, .., A, let Rj = {rj+ie : i G N and rj+ie < 3 — e}. 
The release dates of large jobs Ji,..., Ja are reset to new values taken from 
Ri, ...,R\, respectively. Depending on these values, the other release dates and 
delivery times may also change. 

More precisely, our main algorithm performs the following steps. 

(S-1) Initialize the solution BestFound to be the empty solution and set the 
corresponding value V to infinity. 

(S-2) For each (pi,...,p\) G Ri x ... x R\ such that pj > ma,xj,^^pre(j) Ph (for 
j = l,...,A): 

(S-2.1) Modify instance Ii to get instance I 2 with release dates and delivery 
times equal to the following values 

r| := Pj for Jj G L, 

r| := max{rj, max;,,j,,gP^eQ) r^}, for Jj G S, 

Pj := max{gj, max^.,, 2^,,2 q^} for Jj G L, ^ ' 

qj := max{gj,max^,j,^gS„cO) qI), for Jj G S. 

(S-2. 2) Apply the extended Jackson’s rule to instance I^- Let S and m(/ 2 , S) 
denote the solution and the solution value returned. If m{l 2 ,S) < V, 
then let BestFound := E and V := m{l 2 , E). 

(S-3) Return solution BestFound of value V. 

Step (S-2.1) can be implemented as follows. Release dates and delivery times 
are updated separately. Consider any fixed ordering of the jobs that is consis- 
tent with the precedence relation; this can be obtained by topologically sort- 
ing the precedence graph. To update release dates, the jobs are processed in 
this order; when job Jj is processed, if Jj is a small job then rj is set to 
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max{rj,max;j.j^gPre(j) otherwise to pj. Let denote any fixed or- 

dering of the jobs that is consistent with the precedence relation and such that 
< rl- (Observe that in the following section we show that if Jj -< Jk 
then (see Lemma 3). Therefore it is always possible to find an order- 

ing of the jobs that is consistent with the precedence relation and such that 
r1 < ... < r"l. ) To update the delivery times, the jobs are processed in this 
reverse order, i.e., from v to 1; when job Jj is processed, if Jj is small then is 
set to max{gj, max^. otherwise to max{gj, max/j.,, 2^^2 ^l}- 



4 Analysis 

It is easy to see that \Rj\ < 3/e for every Jj G L. Therefore the number of 
different (pi, ..., p\) can be bounded by (3/e)^/®, since the number A of large job 
is not greater than 1/e. Recall that v is the number of jobs in instance Ii, and 
therefore the number of precedences in Ii is at most 12 ^. Then, Steps (S-2.1) and 
(S-2.2) can be implemented to run in 0 (j 2^) time. Hence, the total running time 
of Steps (S-l)-(S-3) is 

The next lemma shows that condition (2) is valid for each instance I 2 con- 
sidered in Step (S-2.1). 

Lemma 3. In Step (S-2.1), for each couple of jobs, Jj and Jk, of instance I 2 
the following condition holds: Jj < Jk (r| < rf. and > q^). 

Proof. We first prove by induction that 



Jj <Jk^ r] < rl. (4) 

Consider any fixed ordering Ji,...,J^, of the jobs that is consistent with the 
precedence relation. Trivially, condition (4) holds for set {Ji}. Assume that 
condition (4) holds for set Nk-i = {Ji, ..., Jk-i}, then we prove that condition 
(4) holds for set Nk = {Ji, ...,Jk}, for 2 < k < n. If there is no job from set 
Nk-i that must be processed before Jk, then condition (4) holds for set Nk- 
Otherwise, we distinguish between the following cases: 

1. If Jfc G L 

(a) and there is a large job Jj from set Nk-\ such that Jj -< Jk, then r| < r\ 
since pj < pk (see Step (S-2)); 

(b) and there is a small job Jj from set Nk-i such that Jj -< Jk, then 
T'j T^k Si Pk = by definition of Rk and by Lemma 2; furthermore 
Pre{j) C Pre{k), since Jj ^ Jk, and rl = pk > maxj,^gPre(fc) > 
maxj,^(zpre{j) Ph = maxj,^gpre(j) rl; therefore 

Tu > maxj r , , max ru \ = r,. 



2. If Jfc G S' 
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(a) and there is a large job Jj from set Nk-i such that Jj -< Jk, then 
Jj G Pre{k) and therefore 

(b) and there is a small job Jj from set such that Jj -< Jk, then 

T'j ^ fk by Lemma 2; furthermore Pre{j) C Pre{k) since Jj -< Jk, and 
maxj^gP^e(fc) rl > maxj^gpre(j) r^; therefore 

rl = max{r^, m^ r\} > max{r], max ^ jl} = r^. 

h:JhePre{k) h:Jh&Pre(j) 

Hence, we have proved that if Jj ^ Jk then r| < r^. This result guarantees 
that it is always possible to find an ordering of the jobs that is consistent with 
the precedence relation and such that rf < ... < r^. Let J\,...,Ji, denote this 
ordering. In the following, we prove by induction that 

Jj <Jk^ q^j > ql- (5) 

Trivially, condition (5) holds for set {Ju}- Assume that condition (5) is true 
for set Nj+x = {Jj+x,...,J„}, then we prove that condition (5) holds for set 
Nj = {Jj,...,J„}, for 1 < j < 1 / — 1. If there is no job from set iVj+i that 
must be processed after Jj, then condition (4) holds for set Nj. Otherwise, we 
distinguish between the following cases: 

1. If Jj G L 

(a) and there is a large job Jk from set A^+i such that Jj -< Jk, then < r^, 

q] > <Zfc by Lemma 2, and max/j.,, 2^,,2 > max^.,, 2^,,2 q^-, it follows that 

= max{gj,max;,,^2<,.2 ql\ > max{gi, max^,^2<^2 ql} = ql; 

(b) and there is a small job Jk from set A^+i such that Jj ^ Jk, then 

Suc{k) C Suc{j), and since Ji, ..., J^ denote an ordering of the jobs that 
is consistent with the precedence relation, we have Suc{k) C Suc{j) C 
Nj+i. In the previous case (l.a) we have shown that q^>'kki<ky.h:Jh&Suc{j)ql 
and hence q'^ > max/j, ql- By observing that > q] and q] > ql 

by Lemma 2, we have > maxjq^, max/j.j,^gS„c(fc) ql] = ql- 

2. If Jj G S 

(a) and there is a large job Jk from set iVj+i such that Jj -< Jk, then qj > ql 
since Jk G Suc{j); 

(b) and there is a small job Jk from set A^+i such that Jj ^ Jk, then 
q] > ql by Lemma 2; furthermore Suc{j) A Suc{k) since Jj -< Jk, and 

ql > maxj,^gs„c(fe) ql', therefore 

= max{gj , max ql} > max{ql, max ql} = ql. 

Jh&Suc(j) Jh&Suc{k) 



□ 

Now, we examine an artificial situation that we shall use as tool in analyzing 
our algorithm. Let us focus on instance I and a particular optimal solution S* 
for /, consisting of job starting times s*, S 2 , ..., s* . If is < s* < {i + 1)£, for 
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some i S N, then let p* := ie. Consider the modified instance I* in which the 
processing times of all jobs Jj remain unchanged, while release r* and delivery 
q* times are set as follows 

r* := p* for Jj G L, 

r* := max{rj,ma,Xh..j^ePreU) ^ /gN 

q* := ma,x{qj,ma,Xh-.r*<ri for Jj G L, 

q* := max{qj,ma,Xh..j^eSuc{j)qt}^ for J] ^ S- 

By using the same arguments as in the proof of Lemma 3, we can easily show 
that condition (2) holds also for instance I*. Furthermore, let opt{I*) denote the 
optimum value for I*. Then we have the following 

Lemma 4. opt{I*) = opt{I). 

Proof. Instance I* is obtained from I by increasing (or leaving unchanged) re- 
lease dates and delivery times. Therefore, any feasible solution for I* is also a 
feasible solution for I and opt{I*) > opt{I). The claim follows by proving that 
there exists a solution for I* of value < opt{I). 

Consider the optimal solution E* for I, consisting of job starting times 
s*, S 2 ) Sn- K is easy to check that r* < s* {j = l,...,n) and, therefore, we 
can schedule the jobs of instance I* as in E*: the starting time of job Jj is s*, 
for j = 1, ..., n. Let Jc be the job that finishes last, i.e., its delivery is completed 
last, then the value of this solution is equal to s* + Pc + q*- If we prove that 
s* + Pc + q* < opt{I), then the claim follows. 

We prove that s* + Pc + q* < opt{I) by induction. Let J\, ..., denote any 
fixed ordering of the jobs that is consistent with the precedence relation and 
such that r* < ... < r*. (Note that this is possible since condition (2) holds for 
instance I* .) If c = n, then q* = qc and s* +Pc + q* ^ opt{I). Otherwise, assume 
that s* + pj + q*j < opt{I) for every j = c -I- I, ..., n (induction hypothesis). 

If Jc is a large job, let Ju denote the job with r* < and = maxj,rj<r* qj 
(ties are broken arbitrarily). Since r* < and r* = p*, it follows that r* < 
si < r’f < si, and job c is processed before job h in E*, i.e. > s* -I- Pc- 
From induction hypothesis we know that si + ph + ql < opt{I), and hence 
si + Pc + Ph + ql < opt{I). Observe that si + Pc + qc < opt{I). It follows 
that sl+ pc + u\ax{ql, qd < s* -k Pc + max{(ph + ql),qc} < opt{I). Note that 
ql = m.ax{ql, qc}, and we have s* + Pc + 9* ^ opt{I). 

Otherwise, if Jc is a small job, let Jh denote the job such that Jh G Suc{c) and 
ql = maxj.j^gS„c(c) Qj (ties are broken arbitrarily). Since Jh G Suc{c), it follows 
that job Jc must be processed before job Jh, i.e. si > s* + Pc- From induction 
hypothesis we know that sl+ph+ql < opt{I), and hence sl+Pc+Ph+ql ^ opt{I). 
It follows that sl+Pc + max{ql, qd < s*-|-pc+max{(p/i-|-( 7 ^), gd < opt{I), since 
s* + Pc + 9c < opt{I). The claim follows by observing that ql = max{g)(, qc}. □ 

By inequality (1) and by definition of large jobs, in any optimal solution the 
starting time of each large job cannot be later than 3 — £. Therefore, p* G Rj, 
for each Jj G L, and in one of the iterations of step (S-2), we have (pi, ..., p\) = 
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(p*, Now, for simplicity of notation, let us use I2 to denote the modified 

instance of step (S-2.1) when (pi, ..., p\) = (p*, ..., p^). Consider instance I2 and 
ungroup all the jobs that have been grouped together in Section 2.3: if job Jj 
is part of a grouped job Jg then, when we ungroup, we assume that the release 
date r| and delivery time of job Jj are equal to and respectively. Let 
I2 denote the resulting instance. 

Lemma 5. In instance we have Vj = r* and Qj = q*, for j = 1, ..., n. 

Proof. In Section 2.2 we modified the precedence structure by adding new prece- 
dences among jobs. We claim that the introduction of these precedences do not 
change sets Pre{.) and Suc{.) of jobs. Indeed, recall that we restricted the prob- 
lem such that every back-neighbor (front-neighbor) Jh of Si (i = 1, ...,7r) must 
be processed before (after) every job from Si. If Jh is a back-neighbor of Si, then 
there exists a job Jj G Si such that Jh -< Jj. It follows that Pre{h) C Pre{j) 
and Suc{j) C Suc{h). Recall that the jobs from Si have, by definition, the same 
sets Pre{j) and Suc{j). Therefore, by assuming Jh < Jj for each Jj G Si, we 
do not change the set Pre{j) of any job Jj G Si, neither we change the set 
Suc{h) of Jh. Similarly, if Jh is a front-neighbor of Si, then by assuming that Jh 
is processed after all jobs from Si, we do not change the set Suc{j) of any job 
Jj G Si, neither set Pre(h). Furthermore, jobs from Si were further restricted 
to be processed in any fixed ordering that is consistent with the precedence re- 
lation. Clearly, adding precedences among jobs Jj having the same sets Pre{j) 
and Suc{j), does not change these sets for Jj. Therefore, the precedences added 
in Section 2.2 do not change the execution profile of any job. 

According to (6), it is easy to check that small jobs having the same execution 
profile are set to the same r* and q* values. Now, consider the following situation. 
Replace any subset H of jobs having the same execution profile, with a job Jh 
having the same execution profile as the jobs in H , and whose processing time is 
not greater that e, i.e., Jh is a small job in the modified set of jobs. According to 
(6), update the release dates and delivery times of this modified set of jobs; then 
it is easy to see that r*^ = r* and q*n = qj for each Jj G H. The claim follows by 
observing that instance Ii is obtained from / by adding new precedences which 
do not change the execution profile of any job, and by replacing subsets of jobs 
with new small jobs having the same execution profiles. □ 

Consider step (S-2.2), and let m(/ 2 , A) denote the value of the solution A 
returned by the extended Jackson’s rule when applied to instance I2. In order 
to show that the described algorithm is a PTAS, it is sufficient to prove the 
following 

Lemma 6. to(/ 2 ,A) < {l + e)opt. 

Proof. We start by making two simple observations. First, consider any set C of 
jobs belonging to instance I2. Then ungroup the jobs from C and let (7“ denote 
the resulting set of jobs. By Lemma 5, we have the following equation 

min -f > P? + min r? = min r* + > p, + min q*. ( 7 ) 

JiGC ^ ^ ^ ^ ^ ^ ^ ^ ' 

JjGC 
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Second, we observe that 

optil*) > min r* + pj + min q* 



(8) 



Let us define a critical job as one that finishes last in S, i.e., its delivery 
is completed last. Associated with a critical job Jc there is a critical sequence 
B consisting of those jobs tracing backward from Jc to the first idle time in the 
schedule. Let us fix a critical job Jc and denote the last job Jf, in the critical 
sequence with q^ > q^ as interference job for the critical sequence. Let C denote 
the set of jobs processed after Jh in the critical sequence. By the way that Jf, 
was chosen, clearly q^, > q^ for all jobs Jk G C. 

We claim that if there is no interference job then m(/ 2 , S) < opt, otherwise 
m{l 2 ,B) < opt + p1. Assume that an interference job Jb exists. Then, let Sf, 
denote the starting time of job Jf,, then m{l 2 ,S) = Sb + pi + Tlj ^cP^j + 9c- 
Let us say that a job is available at time t if it has been released and all its 
predecessors are scheduled at time t. Notice that for the extended Jackson’s 
rule to schedule job Jf, at time Sf, implies that no job Jk & C could have been 
available at time Sf,, since otherwise such a job Jk would have taken priority over 
job Jf,. Now, there are two alternative situations that could make a job Jk € C 
not available at time Sf,: (1) < Sf, but there is a predecessor Jj of Jk that 

has not been yet scheduled at time Sf,; (2) no job from C has been released at 
time Sb- Consider case (1) and, without loss of generality, let Jj denote some not 
yet scheduled predecessor of Jk for which all its predecessors are scheduled. By 
Lemma 3, we have and q^ > q1- Recall that ql > q1 for all Jk G C, and 

q1 > q1 by definition of interference job. Therefore job Jj has q^ > q1, r'j < Sb, 
and it is available at time Sf, since all its predecessors are scheduled. By the 
definition of the extended Jackson’s rule, no unscheduled job Jj could have been 
available at the time when Jf, is processed with q^ > q1, since otherwise such a 
job Jj would have taken priority over job Jf,. Therefore case (1) is not possible. 
According to case (2) we have Sf, < minj^gC r|. Furthermore, by the way that Jf, 
was chosen, clearly q1 < minj^gC qj. Now we can bound the value of m(/ 2 , S), 
using (7) and (8), as 

m(/ 2 , S) < p1+ min p^ + min q^ 

^ J,(ZC ^ ^ JGC ^ 

JjGC 

<p1 + opt{I*) <pI + opt. 

If there is no interference job, using (7) and (8), it is easy to check that 
m(/ 2 , A) = minj^gB rj + X^j^-gs P^j + 9^ < opt. 

Then if there is no interference job or Jf, is a small job we have m(/ 2 , S) < 
(1 + e) • opt, by definition of small jobs (grouped or not). Now, consider the last 
case in which the interference job Jf, is a large job. Recall that q1 > q1 and 
Sb < minj^gCf r|, thus r1 > r1- But Jf, cannot be a large job since according to 
(3) if r1 > r1 then q1 > ql-D 
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By the arguments of Section 2, solution E can be easily transformed into a 
feasible solution for the original instance. This results in a (1 + 3e)-approximate 
solution, and therefore we have given a polynomial time approximation scheme 
for l|rj,prec|Lmax- 

Theorem 1. There exists a PTAS for problem l|rj,prec|Lmax that runs in 
0(n + £ + time. 
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Abstract. In [12] we started a research on a distributed-timed exten- 
sion of Petri nets where time parameters are associated with tokens and 
arcs carry constraints that qualify the age of tokens required for enabling. 
This formalism enables to model e.g. hardware architectures like GALS. 
We give a formal definition of process semantics for our model and in- 
vestigate several properties of local versus global timing: expressiveness, 
reachability and coverability. 



Introduction 

Verification of concurrent and parallel systems plays nowadays an important 
role in the concurrency theory, with a number of successful applications. Algo- 
rithmic methods have been developed for process algebras generating infinite 
state systems, timed process algebra, Petri nets, lossy vector addition systems, 
counter machines, real time systems and many others. In particular, the idea to 
equip automata with real time appeared to be very fruitful and there are even 
automatic verification tools for such systems as UPPAAL [9] and KRONOS [5]. 

The main idea behind timed automata is to equip a standard automaton with 
a number of synchronous clocks, and to allow transitions to be conditioned on 
clock values and to affect (reset) clocks. One of the objections to this formalism is 
the assumption of perfect synchrony between clocks. For many applications this 
assumption is justified, but for others it is unrealistic. Clearly, geographically 
highly distributed systems are prime examples, but the issue has been addressed 
also for hardware design; e.g. in work on so-called Globally Asynchronous Locally 
Synchronous (GALS) systems [11]. 

Following these arguments we suggested in [12] a new model called distributed 
timed-arc Petri nets (DTAPN). One of the reasons for choosing Petri net for- 
malism is the explicit representation of locality. 

Several models that take time features into account have been presented in 
the literature (for a survey see [4,17]). For example timed transitions Petri nets 
were proposed in [13] where transitions are annotated with their durations. A 
model in which time parameters are associated to places is the timed places 
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Petri nets of [16]. We shall analyse timed-arc Petri nets [3,7], a time extension 
of Petri nets where time (age) is associated to tokens and transitions are labelled 
by time intervals which restrict the age of tokens that can be used to fire them. 
In this model, time is considered to be global, i.e., all tokens grow older with 
the same speed. In spite of the fact that reachability is decidable for ordinary 
Petri nets [10], it is undecidable for global timed-arc Petri nets [15]. On the other 
hand, coverability is decidable for such a model [14,1], which is also known to offer 
‘weak’ expressiveness, in the sense that it cannot simulate Turing machines [2]. 

In [12] a new model is suggested where time elapses in a place independently 
on other places, taking the view that places represent ‘localities’. The idea of 
local clocks is generalised in the DTAPN model, where we use an equivalence 
relation on places to specify which pairs of places must synchronise. As special 
instances we get local timed-arc Petri nets (LT nets), where no synchronisations 
are forced, and global timed-arc Petri nets (GT nets), with full synchronisation. 

In this paper we give a formal definition of process semantics which provides 
a reading of the differences between the LT and the GT net models and of their 
relative strengths. Among the motivations behind LT nets, they seem to be a 
weaker model than the global time one and some interesting properties could 
be verified algorithmically. Nevertheless, we prove that the general reachability 
problem for LT nets is undecidable. However, we show that a small modification 
of the problem (a slight restriction of the set of allowed initial markings) makes 
reachability decidable for LT nets, but not for GT nets. Finally, we argue that 
coverability is decidable for all DTAPNs. 

1 Distributed Timed-Arc Petri Nets 

Definition 1 (Distributed timed-arc Petri net). 

A place/transition Petri net (PT) is a tuple (P,T,F), where P is a finite 
set of places, T is a finite set of transitions such that T C\ P = th, and F C 
(P X T) U (r X P) is a flow relation. 

A distributed timed-arc Petri net (DTAPN) is a tuple N = {P,T,F,c,E,D), 
where (P, T, F) is a Petri net and: 

— c : P|pxT D X {D U {oo}) is a time constraint on transitions such that 
for each arc {p,t) G P, if c{p,t) = (^ 1 ,^ 2 ) then t\ < t 2 , 

— P C P X P is an equivalence relation on places {synchronisation relation), 

— P> G {K.[]",No} is either continuous or discrete time. 

Let X G D and c{p,t) = (^ 1 ,^ 2 )- We write x G c{p,t) whenever t\ < x < t 2 - We 
also define *x = {y \ (y,x) G P}, x* = {y \ (x,y) G P}, for a; G P U T and use 
B{X) to denote the set of all finite multisets on a set X. In what follows, we 
assume that yf 0 for every t GT. 

A marked PT net is a net (P, T, F) together with an initial marking M G 
B{P). A marking of a DTAPN (P, T, F, c, P, D) is a function M: P ^ B{D). A 
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marked DTAPN is a pair (N,M), for M a marking of N with all tokens of age 
0. Each place is thus assigned a number of tokens, and each token is annotated 
with a real (natural) number {age). Let x € B{D) and a € D. We define x <1- a 
to add the value a to every element of x, i.e., x<i-a={b + a\ b€ x}. 

The dynamics of DTAPNs is defined by two types of transition relations: 
firing of a transition and time-elapsing. 

Definition 2 (Transition rules). 

Let N = (P, T, P, c, E, D) be a DTAPN, M a marking and t £ T. 

— We say that t is enabled by M iff Vp G *t. 3x £ M{p). x £ c{p, t). 

— If t is enabled by M, it can fire producing a marking M' , in symbols M[t)M' , 
such that: 

Vp G P. M'{p) = [m{p) \ C~{p,t)^ U C+{t,p) 

where C~ and C~^ are chosen to satisfy the following equations (which may 
have several solutions): 



J {x} such that x £ M{p) and x G c{p, t) if p G *t 
\ 0 otherwise 



C+{t,p) 



{0} ifpGf 
0 otherwise. 



Note that the new tokens added to places t* are of initial age 0. 

— We define a time-elapsing transition e, for e: P/E -£■ D, as follows, where 
[p]e denotes the P-equivalence class of p: 

M[e)M' iff Vp G P. M'{p) = M{p) <£ e{[p]E). 



We write M — >■ M' iff either M[t)M' or M[e)M' for some t or e. 

Two classes of DTAPNs play prominent roles. The first one requires an ab- 
solute synchronisation and was studied in the past, while the other one is a new 
model suggested in [12] — completely asynchronous: 

— Global timed-arc Petri nets (GT nets): E = P x P, 

— Local timed-arc Petri nets (LT nets): E = Ap = {(p,p) | p G P}. 



2 GALS Architectures 

In high-performance VLSI the clock management is the main source of power 
consumption. Keeping one global clock synchronised is usually the bottleneck of 
a processor design. In [11] the authors suggest a method to decrease the pitfalls 
of global clock distribution. A processor design is partitioned into synchronous 
blocks that communicate globally with each other on asynchronous basis using 
a handshake mechanism. This architecture is called Globally Asynchronous and 
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f SBl: ^ 

Synchronous 
Block 1 






C SB3: ^ 

Synchronous J 
V^Block 3 ) 






SB2: ^ 

Synchronous 
Block 2 J 



Handshake 

signals 



Fig. 1. GALS architecture 




Fig. 2. Modelling of handshake mechanism between SBl and SB2 



Locally Synchronous (GALS) architecture. The authors applied this technology 
to a realistic design with million gates, saving about 30% of power energy with 
negligible overhead. Fully asynchronous solutions to the problem have been also 
examined, for an overview see e.g. [8]. 

Distributed timed-arc Petri nets, in particular LT nets in the fully asyn- 
chronous case, appear to be a good model for such architectures. Let us now 
focus on the design of GALS. Figure 1, from [11], represents the basic concept 
of GALS architecture for three synchronous components. For each of the syn- 
chronous blocks SBl, SB2 and SB3 we design a GT net, joining them together 
by means of a handshake communication as in Figure 2 (in which every arc 
from a place p to a transition t is labelled by the time interval c(p,t)). This 
creates the final DTAPN with a synchronisation relation respecting the place 
partitioning given by the blocks SBl, SB2 and SB3. The transition ‘handshake’ 
forces the blocks SBl and SB2 to synchronise and after this transition is fired 
new tokens of age 0 appear in places p'l and p^. Then SBl and SB2 can con- 
tinue their un-synchronised performance. If we set ti = t 2 = oo then there are 
no time constrains on the maximal waiting time for the handshake communica- 
tion. However, by changing the values ti and we may forbid a late handshake 
synchronisation . 

Further examples of LT nets and DTAPNs, as e.g. timed producer/consumer 
systems or Fischer’s mutual exclusion protocol, have been described in [12]. 



3 Process Semantics for DTAPN 



We aim at providing a common ground on which to assess relative expressiveness 
of GT nets and LT nets. In this section, building upon the idea of PT net 
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Pi 



P3 



o; ^0 

[2,3] [5,8] 



P2 



P4 



Fig. 3. Dependent transitions in a GT net and independent in an LT net 



processes [6], we formalise a notion of processes of DTAPNs and establish their 
properties with respect to firing sequences. 

The subtle differences between computations of LT and GT nets that we want 
to address can be illustrated with the help of the net of Figure 3. Were this net 
an ordinary net, transitions t\ and t2 would be completely independent. Things 
are not so neat when we consider the time constraints. If the net is a GT net, 
i.e. time is global, after firing t 2 , the transition ti cannot possibly fire anymore. 
If instead we consider the net under the local time interpretation, t\ and t 2 can 
again be considered completely independent, as the one’s firing cannot affect the 
other’s enabledness. 

Definition 3 (Process Nets). 

A process net is a PT net U = {P, T, F) such that II is acyclic, i.e., for x,y G II, 
X Fn y implies y 7^77 x, and II is deterministic, i.e., for each p G P, |*p| < 1 
and Ip* I < 1, where -<n denotes the transitive closure of F. 

Each place p of a process net II has exactly zero or one transition in its 
preset. We define *p = t if *p = {t} and *p = T if *p = 0. By min(il) we denote 
the set of ^77-minimal places of 77, i.e., min(77) = {p G P \ *p = T}. Process 
nets are implicitly considered marked, with initial marking min(77). With abuse 
of notation, in the following we shall write p G II and t G II avoiding explicit 
mention of the components P and T oi II, as this is not likely to create ambiguity. 
Analogously, we usually drop the subscript from ^77. 

Definition 4 (PT Process). 

A map a: {P,T,F,M) — >■ {P' ,T' , F' , M') of marked PT Petri nets is a function 
ct: Put P' DT' mapping P to P' and T to T' such that a{M) = M' , and 
for all t G T, (j{*t) = and cr(t*) = a{t)* . 

A process tt of {N, M) is a map tt: P — >■ {N, M), for II a finite process net. 

The notion of slice, which provides a snapshot of a running process’ state, 
plays a role in our development. We say that x,y G II are concurrent if neither 
X ^ y nor y ^ x. A slice of tt is a maximal set of concurrent places of II. E.g., the 
PT net underlying the net of Figure 3 has precisely four slices: {pi,P 2 }, {pi,P 4 }, 
{P2,Ps}, {P3,P4:}- We use = {t \ t ^ s, s G S} and = {t \ s ^ t, s G S} to 
indicate the two parts a slice S partitions the transitions of II into. 

Processes of DTAPNs rest upon the notion of PT net process, enriching it 
with a suitable treatment of time constraints. Each firing of a transition is time- 
stamped with the time elapsed since the process began, according to each of the 
‘clocks’ (E-equivalence classes) involved. 
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Definition 5 (DTAPN Net Processes). 

Let N = {P, T, F, c, E, D, M) be a marked DTAPN. A process of A is a process 
7T : 77 — >■ A of the underlying PT net together with a ^-totally preordered family 
6 = {6t'. P/E ^ D}t^jj of partial functions such that St{x) is defined if and only 
if 7r(*t U t*) n a; yf 0 and for each arc (p, t) of 77 

St{[TT{p)]E) - S*p{[tt{p)]e) G c(7r(p),7r(t)), 

where, by convention, <5j_(x) = 0 for all x, and where St < St' if St{x) < St'{x) 
for all X G dom(i5t) fl dom(<5t/). 

The condition above enforces the time constraints c on the arcs by bounding 
appropriately the difference between tokens’ creation and consumption times. 
The special case of <5j_ deals with 0-aged tokens in the initial marking. For GT 
nets, each St reduces to a single time-stamp according to the (unique) global 
clock. In the case of multiple clocks, the preorder < ensures that the time do- 
main is consistent, ruling out situations in which concurrent transitions have 
incompatible perceptions of the time elapsed. Notice that the linearity condition 
does not mean sequential processes, as we may have both St < St> and St' // St- 
Slices need refinement to adapt to our timed model. Observe, in fact, that 
{pijPi} can never be a slice of any process when the net in Figure 3 is considered 
as a GT net, as the behaviour in which O occurs before G is not realisable. We 
shall thus define a slice of a DTAPN process to be a slice S of the underlying 
PT process such that St < Sf, for all t £ and all F G S'^. 

We now proceed to prove an important sanity condition for our processes, by 
relating them to firing sequences and markings. In order to extract a marking 
from a slice, the following definition determines the age of tokens in each places 
as the difference between the time-stamp of the slice according to clock x (viz. 
max(S', a;)) and the time (according to the same x) when the token was generated. 
This allows us to pass from the absolute time on processes’ transitions to the 
relative one found in firing sequences’ markings. 

Definition 6 (Markings Compatible with a Slice). 

Let (tt : 77 — >■ A, 5) be a process of a marked DTAPN A and S a slice thereof. 
Marking Ms is associated to S if only places in tt{S) are marked, and for each 
P G Tr(S'), 



Ms{p) = {max(S', [p]b) - So;{[p]e) \x = *p, Tr{p) = p} 

where max(S', x) = max{5t(a;) | t G *S, x G dom((5t)}, convening that max0 = 0. 
The set of markings of A compatible with S is 

m{'K,S) = {M I Ms[e)A7, for e a time-elapsing transition of A}. 

Theorem 1. Let (N,M) be a DTAPN and (7t,5) a process thereof. For each 
slice S of n and each M' G m{'K,S) there exists a firing sequence M — >■* M' . 

Proof. By induction on the size of S^. The base case is easy, for = 0. In 
the induction step, we must have t £ with t* C S. Among these, choose t 
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one with <-maximal St, that exists by hypothesis on S, so to ensure that S\t* 
is a slice. By induction hypothesis, there is a firing sequence M — >■* Mi, where 
Ml is a marking in m{-K,S\ t*) such that Mi[t)M 2 . Then, by an appropriate 
time-elapsing transition, M 2 [e)M', as required. □ 

Theorem 2. Let {N, M) he a DTAPN. For each firing sequence M — >■* M' of 
N, there exists a process such that M' € m(7r,S), for S a slice of tt. 

Proof. Easy, by induction on the length of the firing sequence. □ 

The difference between GT nets and LT nets is reflected in our formalisation 
above in two related aspects. Firstly, GT nets have fewer processes, due to the 
more stringent synchronisation constraints. Secondly, these processes have fewer 
slices, that is a smaller internal concurrency. This is nicely summarised in <, 
that is a ‘loose’ preorder in the case of LT nets, and essentially becomes a ‘tight’ 
linear order for GT nets. 



4 Reachability and Cover ability 

LT and GT nets can be compared on the grounds of various decidability ques- 
tions. Ruiz, Gomez and Escrig recently proved in [15] that reachability is un- 
decidable for GT nets. Their proof does not imply undecidability for LT nets, 
because it relies on synchronised places. In principle, it may seem that the model 
of LT nets is less powerful than the one of GT nets. Nevertheless, we demon- 
strate that reachability for LT nets is undecidable as well. The proof is based 
on a reduction from the halting problem of Minsky machine with two counters. 
Notice that this contrasts with the result by Mayr [10] stating the decidability of 
reachability for ordinary Petri nets. The reachability problem for local timed-arc 
Petri nets can be formulated as follows. 

Problem: Reachability for LT nets. 

Instance: A marked LT net (TV, M) and a final marking M'. 

Question: M — M' ? 



Definition 7 (Minsky machine with two counters). 

Minsky machine R with two counters c\ and C 2 is a finite sequence 

R = (Li : /i, L 2 '■ I 2 , . . . , Ln In) 

where Li,...,L„ are pairwise different labels, are instructions such 

that I\,. . . , In-i are exactly of one of the following types: 

— increment: Cr := Cr + 1; goto Lj 

— test and decrement: if c^. = 0 then goto Lj else := c^. — 1; goto L^ 

where 1 < r < 2 and 1 < j, k < n. The last instruction /„ is always a special 
instruction halt. 
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Fig. 4. Increment instruction 



A machine R starts its execution (with given input values of ci and C 2 ) from 
the instruction Ii and it halts if it reaches the instruction halt in a finite number 
of steps. Otherwise it diverges. The halting problem for a machine R with the 
initial values of counters ci = C 2 = 0 is known to be undecidable. The following 
variant of the problem is easily seen to be undecidable as well. 

Problem: Halting problem with empty counters. 

Instance: A Minsky machine R with ci = C 2 = 0. 

Question: Does R halt and both counters are empty? 

Given a Minsky machine R = : R, L 2 '■ I 2 , ■ ■ ■ , Ln : /„) we construct 

a local timed-arc Petri net N with continuous time which weakly simulates the 
machine R. We define N = (P, T, F, c, Ap, R(|") where 

P = {pi I 1 < i < n, A of type increment} 

U {poPi )Pi )Pi I 1 < * < n, Ii of type test and decrement} 

U {p„,Ci,C2,Ps,Pe}, 

T = {ti \ 1 < i < n, Ii oi type increment} 

U {ti,t},tf,ti,tj,ti,ti \ I < i < n, Ii of type test and decrement} 

U {ts,te}, 



F and c are described in the text below. 



For every instruction := + 1 ; goto Lj, with 1 < i < n and 1 < r < 2, 

we add the arcs between places and transitions depicted in Figure 4. For every 
instruction Li : if = 0 then goto Lj else Cr '■= Cr — f; goto Lk, with 

1 < i < n and 1 < r < 2, we add the arcs depicted in Figure 5. Moreover we add 
a starting and an ending transition, as illustrated in Figure 6. Initial and final 
markings M and M' are 



( {0} if p = Ps 
M{p) = {0,0} if p G |ci,C2} 

I 0 otherwise. 



M'(p) 



{0} if P = Pe 
0 otherwise. 



We prove that M — >■* M' if and only if R halts on the input ci = C 2 = 0 with 
both counters empty. Thus we show that reachability for LT nets is undecidable. 

Lemma 1. If R halts with both counters empty then M — >* M' . 
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Fig. 5. Test and decrement instruction 





Fig. 6. Start and end of the simulation 



Proof. Suppose that after a finite sequence of instructions executed by R the 
machine stops in the instruction : halt with both counters empty. Then we 
can simulate this sequence in as follows. 

First, we let the four tokens in places Ci and C 2 reach the age 2 and then we 
fire the starting transition ts, which puts a token into place pi and in both ci 
and C 2 remains one token of age 2. An increment instruction R is simulated by 
firing the transition ti without any time elapsed. If R is a test and decrement 
instruction and the corresponding place Cr contains a token of age 0, we fire the 
transition R. Again there is no time-elapsing transition. If the place Cr contains 
only one token of age 2 then we fire the sequence of transitions 
First, three tokens of age 0 are added and then the token of age 2 is removed 
by the transition tf. Then we allow to pass one time unit in the place Cr, which 
means that Cr now contains three tokens of age 1. We consume one of them 
by firing the transition and let pass another time unit in Cr- Then we fire 
the transition t®. The resulting marking contains one token of age 2 in c^, one 
token of age 0 in pj and the place c^-r is untouched. Eventually a token of age 
0 appears in the place and the places c\ and C 2 contain one token of age 2 
each. That means that we can fire the ending transition tg and reach M' . □ 

Lemma 2. If M — >* M' then the machine R halts with both counters empty. 

Proof. We can naturally simulate the behaviour of the net N by executing the 
corresponding instructions of the machine R. The only problematic case is when 
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a transition t] is fired and the counter Cr is non-empty. However, we show that 
if this happens then the marking M' cannot be reached. 

First observe that the only transition that can be fired from M is ts and 
there must be a time-elapsing transition before it. Thus the resulting marking 
contains one token of age 0 in pi and one token of age 2 in both ci and C 2 - 
Notice that whenever a token of age strictly greater than 2 appears in ci or 
C 2 (we call such a token dead), M' is not reachable. The same happens if a 
token of age different from 0 appears in some of the places pi,p 2 , ■ ■ ■ ,Pn- Thus 
a time-elapsing transition cannot occur if we aim to reach the marking M' . The 
values of counters c\ and C 2 are represented by the corresponding number of 
tokens of age 0 in the places c\ and C 2 respectively, with one additional token of 
age 2. Suppose that we fire a ‘cheating’ sequence tf , t® such that Cr 

contains except for one token of age 2 also a non-zero number of tokens of age 
0. By examining all possibilities of firing this sequence (we want to avoid dead 
tokens), we end up with having at least two tokens of age 2 in and moreover 
all tokens in Cr are of age 2. Notice that we cannot fire the transition te if there is 
more than one token in c\ or C 2 . Should M' be reachable, we have to fire another 
sequence of tj . However, now there are at least two tokens of age 2 

in Cr and all other tokens are of age 0. During firing of no time-elapsing 

is allowed (otherwise dead tokens appear). After is fired, there still remains 
at least one token of age 2 but there is no token of age 1 to enable if. This 
means that a time unit must pass in Cr to enable tf, which causes that a dead 
token of age 3 appears in Cr- Thus whenever a ‘cheating’ sequence is fired, the 
marking M' cannot be reached. This means that the simulation is faithful and 
if M — >■* M' then R halts with both counters empty. □ 

Notice that the same construction works also for discrete time. 

Theorem 3. Reachability for LT nets is undecidahle. 

On the other hand, it is sufficient to restrict the class of nets we consider 
very slightly in order to separate local and global timed nets. 

Definition 8 (Safe marking and safe DTAPN). 

A marking M: P — >■ B{D) is safe if \M{p)\ < 1 for every p G P. A marked 
DTAPN {N, M) is safe if the initial marking M is safe. 

We now turn to show that the reachability problem for safe LT nets is de- 
cidable. Before proving the key decidability lemma, we fix some notation. For 
N = (P,T,F,c,E,D) a DTAPN, let N° denote the underlying ordinary Petri net 
{P,T,F). We define a time-forgetting function /: [P — >• B{D)\ — ^ B{P), map- 
ping DTAPN markings to PT net markings; the multiset f{M) has precisely as 
many copies of p as there are numbers in M{p). For M° G B(P) a marking of 
N°, we use T{M°) C to denote the set of all timed-markings that 

have in each place the same number of tokens as M° , i.e., T(M°) = f~^{M°). 

Lemma 3. Let {N,M) he a safe marked LT net. Lf f{M) — >•* M° in N° then 
M — >•* Ml in N for every Mi G T(M°). 
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Proof. By induction on k we prove that if f{M) — Mf in N° then M — >■* Mi 
in N for every Mi € T{Mf). 

Base case: If fc = 0 then Mf = f{M) and it can be easily seen that M[e)Mi 
for every Mi € T{f{M)). Hence M — > Mi for every Mi € T{Mf). 

Induction step: Let k > 0. Assume that f{M) — ^ Mf [t) in N°. Let 
us fix an arbitrary M 2 G T{Mff). We show that M — >* M 2 in N . By min^ we 
denote min{M 2 (p)}. We define a marking Mi in N as follows: 



{ M2{p) ifpG P\ CtUf) 

M 2 {p) U {x} if P G \ f and c(p, f) = {x, _) 

(M 2 {p) <f (-minp)) \ {0} if p G t* \ 

(M2{p) <+ ( — miiip )) \ {0} U {x} if p G *tnt* and c{p, t) = (x, _). 

Obviously, Mi G T{Mf). Because of induction hypothesis we know that M — >■* 
Ml in N. We prove our lemma by showing that Mi — >■* M 2 . It is, however, 
easy to see that Mi [t) M[ [e) M 2 , where 

m;(p) = ifperNi- itpePsi- 

I M2(p) <b (^ninp) if p G t* |minp if p G t*. 

□ 



Theorem 4. Reachability is decidable for safe LT nets, but undecidable for safe 
GT nets. 

Proof. Let {N, M) be a safe LT net. Trivially, for any marking Mi reachable 
from M it is the case that /(Mi) is reachable from f{M) in N°. Hence, using 
Lemma 3, the reachability problem for safe LT nets is reduced to the reachability 
problem for ordinary Petri nets, and this problem is decidable [10]. The reason 
for undecidability of reachability of safe GT nets is that in the undecidability 
proof from [15] the initial marking is safe. □ 

The coverability problem for GT nets was shown to be decidable — for 
discrete time in [14] and for continuous time in [1]. Following these results one 
proves that coverability is decidable even for DTAPNs. 

Problem: Goverability for DTAPNs. 

Instance: A marked DTAPN {N, M) and a final marking M'. 

Question: 3M" . M — M" A \/p € P. M'{p) C M"{p) ? 



Theorem 5. Coverability for DTAPNs is decidable. 

5 Conclusion 

We have studied a recently introduced model of distributed timed-arc Petri nets. 
The model is well motivated and captures e.g. the ideas behind the Globally 
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Asynchronous Locally Synchronous paradigm. We provide a formal process se- 
mantics for the model and compare the expressiveness of LT nets versus GT nets. 

We give answers to the most frequently studied decidability problems for Petri 

net models — reachability and coverability — finding a very delicate decidability 

borderline in the case of reachability for LT nets. 
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Abstract. This paper enhances the linear temporal logic model check- 
ing process with the ability to automatically generate a deductive proof 
that the system meets its temporal specification. Thus, we emphasize 
the point of view that model checking can also be used to justify why 
the system actually works. We show that, by exploiting the information 
in the graph that is generated during a failed search for counterexam- 
ples, we can generate a fully deductive proof that the system meets its 
specification. 

1 Introduction 

Model checking [1,2] is an automatic process for verifying temporal properties of 
finite state systems. Model checking techniques construct a (finite) model that 
represents the joint computations of the system and the negation of the property 
to be verified, and apply graph algorithmic techniques to the model to check 
that no computation of the system satisfies the negation of the property (and 
violates the property.) It was first applied to Linear Time Temporal Logic (LTL) 
properties in [5,6]. In the automata-theoretic approach ([II], and, independently, 
in [4]) both system and (negation of) property are explicitly represented as 
automata, and the intersection of the automata is checked for emptiness. 

When executions that violate the property exist, at least one is reported and 
serves as a counterexample for the specification. When the search for counterex- 
amples fails one may conclude that the system satisfies its specification. However, 
our confidence in such a positive conclusion is tarnished by two possible factors: 

— For reasons of complexity and decidability, the system that is checked is 
often an oversimplification of the actual system, hence, failure to find coun- 
terexamples for the checked system does not necessarily imply a fault-free 
actual system, since faults of the actual system may have been abstracted 
away. 

— The model checker itself may contain faults, causing it to report success 
when given a faulty system. 

Both these risks may cause us to treat with diffidence a result that purely claims 
success without providing some supporting evidence-a “witness” or “certificate” 
that the property does indeed hold over the considered system. This ‘proof by 
lack of counterexample’ is the main drawback of the model checking approach; 
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some would even say that model checking is a tool for falsification rather than 
a tool for verification. 

An alternative approach to model checking is deductive verification that in- 
crementally constructs proofs until the desired conclusion, a proof the system 
meets its specification, is obtained. Deductive verification is often manual, and, 
like all deductive proofs, requires considerable human skill and time. One of its 
main benefits is that it often explains why the system satisfies its specification. 

In this paper we enhance the LTL model checking process with the ability to 
automatically generate a deductive proof that the system meets its LTL spec- 
ification. Thus, we emphasize the point of view that model checking can also 
be used to justify why the system actually works. We show that, by exploiting 
the information in the graph that is generated during a failed search for coun- 
terexamples, we can generate a fully deductive proof that the system meets its 
specification. 

Several advantages are gained by a checker that, when the property is in- 
valid, produces a counter example, and when the property is valid automatically 
produces a proof. For one, the proof can be independently checked by a theorem 
prover. (In fact, this can be used as a debugging tool for the model checker.) 
Moreover, if the system is a simplification of a more complex actual system, 
the proof can help to justify (or refute), and, in the case of justification, be 
transformed into a deductive proof of the property for the actual system. 

The automata which represent the system and a tester for the LTL property, 
are represented in this paper as Just Discrete Systems (JDS) [3]. The JDS model, 
which is a variant of Fair Transition System model [8], is introduced in Section 2. 
A JDS is a transition systems that includes a set of justice requirements, each 
being an assertion that should be met by a computation of the system infinitely 
many times. JDSs correspond to Generalized Biichi automata in the automata 
theoretic view. 

Both the system to be checked and the negation of the property are expressed 
by JDSs. We construct the synchronous parallel composition of these two JDSs 
to obtain a new JDS, the computations of which are the system computations 
that violate the property. If such a computation exists-the new JDS is feasible- 
it provides the desired counterexample. Otherwise, the graph generated by the 
composition is exploited to provide several alternative deductive proofs for the 
validity of the property over the system’s computations. 

The main challenge is how to represent the proof that is implicit in the 
composition JDS. We would like to represent it in a way that would explain 
to the user why the property holds for the checked system. We explore two 
alternatives of representing such a proof. Our first proof system (Section 3) 
automatically generates a proof based on well-founded ranking. 

We then (Section 4) provide an algorithms that generates two temporal logic 
proof scripts, one that proves that the negation of the property is not satisfiable 
over the system, and the other proves that the property is valid over the system. 

The process of generating the second proof script had been implemented in a 
new system, ProofProd, currently under construction. In Section 5 we present 
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a sample output of ProofProd. While the machine generated temporal logic 
proofs seems to be hard to read, minor heuristics can transform them to proofs 
that resemble efficient human-generated proofs. 

Related Work: A preliminary version of this work [10] introduces the con- 

cept of generating the proof as a complementary stage of model checking using 
Generalized Biichi automata and general proof rules. In an independent work, 
Namjoshi [9] has shown a proof system for the y^-calculus, based on alternating 
tree automata and parity games. There, LTL is treated as a special case of CTL* 
using V-automata, and fairness is incorporated into the property. 

2 Preliminaries 

This section describes Just Discrete Systems (JSDs), the computational model 
we use here, reviews temporal logic, describes tableaux for JDSs, and shows how 
to use them for model checking. 

2.1 Sequences 

Let P be a set of typed variables. A P-state is an interpretation of P, assigning 
to each variable x € V a, value in its respective domain. We denote by s[x] the 
value s assigns to variable x. Let Sy denote the set of all P-states. A sequence 
over V, or a V -sequence, is a (possibly infinite) sequence a = sq: si, . . . over Ey- 
The length of a, \a\, is the number of states in a finite cr and u otherwise. 

Given a P-sequence cr = Sq: Si, • • • and some i, 0 < i < jcrj, we denote by ct* 
the suffix of a obtained by removing its first i elements. For a state assertion ip, 
we say that i is a p-position of a if Si is a (^-state (i.e., Si ^ p). 

2.2 Just Discrete Systems 

As a computational model for reactive systems, we take the model of just {weakly 
fair) discrete system (jDS), which is a variation of the model of fair transition 

system [81. A JDS V ■. (P O, O, p, J) consists of the following components: 

• P = {ui, ..., Un\ : A finite set of typed system variables, containing data and 

control variables. The set of states (interpretation) over P is denoted by Sy . 
When P is clear from the context, we denote Ey simply by E. Note that E 
can be both finite or infinite, depending on the domains of P. 

• O = {oi,...,o„} C P : A finite set of observable variables. These are the 
variables which the environment can observe. 

• 0 : The initial condition - an assertion (first-order state formula) character- 
izing the initial states. 

• p : A transition relation - an assertion p{V, V'), relating the values P of the 
variables in state s G A to the values P' in a successor state s' G E. For 
example, the assertion x' = x-\-l corresponds to the assignment a; := a;+ 1. 

The state s' is defined to be a V-successor of state s if (s, s') \= p{V, P'). 
That is, p evaluates to true when we interpret every a; G P as s[x] and every 
x' as s' [a:]. 
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• J '■ {Jit ■ ■ , Jk} '■ A set of justice {weak fairness) requirements. The jus- 
tice requirement J G is an assertion, intended to guarantee that every 
computation contains infinitely many J-states (states satisfying J). 

We require that every state s G S has at least one 2?-successor. This is often 
ensured by including in p the idling disjunct V = V' (also called the stuttering 
step). In such cases, every state s is its own T>-successor. 

A computation oiV = {V, O, 0, p, J) is an infinite sequence of states 

a : So, Si, S2, 

satisfying the following requirements: 

• Initiality: sq is initial, i.e., sq |= 0. 

• Consecution: For each j = 0, 1, ..., state s^+i is an 2?-successor of state Sj. 

• Justice: For each J G ff , a contains infinitely many J-positions 

We denote by Comp{V) the set of computations of JDS V. 

A sequence satisfying the requirement of consecution is called a run of T>. A 
run satisfying the requirement of initiality is called an initialized run. 

Let U C V be a subset of the state variables and s be a Wstate. We denote by 
sJJ-j, the 17-state, called the projection of s on U, which is obtained by restricting 
the interpretation of variables to the variables in U. 

For a Wstate sequence a: sq, si, . . ., we denote by the projected [/-state 
sequence : so-lj-iy, siJJ-y , . . .. An O-state sequence f2 is called an observation 
of the JDS V if Q = for some computation cr of V. We denote by Obs{V) 
the set of observations of JDS T>. 

Systems T>i and T >2 are comparable if they have the same sets of observable 
variables, i.e., Oi =02. System T> 2 , is said to be an abstraction of the comparable 
system T> 2 , denoted T>i C T >2 if Obs{T>i) C Obs{T> 2 ). The comparable systems T>i 
and T >2 are said to be equivalent, denoted T>i ~ 2 ? 2 , if their sets of observations 
are identical. That is, Obs{T>i) = Obs{T> 2 ). Note that if T>i and T >2 are equivalent 
then each of them is an abstraction of the other. 

A JDS T> is said to be feasible if T> has at least one computation. T> is defined 
to be viable if any finite initialized run of T> can be extended to a computation. 
Systems T>i and T >2 are compatible if Vi nV 2 = Oi (1 O 2 . 

The synchronous parallel composition of the compatible systems T>i and I? 2 , 
denoted by T’i|||T’ 2 ) is given by the JDS V = {Vi U V 2 , Wi U IF 2 , Oi U O 2 , 0i A 
02, Pi ^ P 2 , JiC J 2 ), and is used for ltl model checking as shown below. 

Claim. Let T> = 2?i|||T>2- Then, a Wsequence cr is a computation of T> iff crjj.^,^ 
is a computation of T>i and cr 1).^^ is a computation of T> 2 . Similarly, the O- 
sequence cr is an observation of T> iff crjj.^^ is an observation of T>i and crll^^ is 
an observation of T> 2 . 



2.3 Temporal Logic 

Let 77 be a set of propositions, which can also be viewed as a set of boolean 
variables. A 77-state s is an interpretation of these variables, which we can 
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also represent as an element of 2^, i.e. a subset of II, where p G s iff s[p] 
is interpreted as 1 (true). We consider here linear time propositional temporal 
logic formulae over U, using the Boolean connectives V and and the temporal 
operators next-time Q and until U. Other Boolean connectives and Temporal 
operators (□, O, V, etc.) can be defined using V, 0> and U. Temporal 
logic formulae are interpreted over infinite sequences of 77-states in the usual 
way (see, e.g., [7].) Thus, u \= (p denotes that the infinite 77-sequence <j satisfies 
the temporal formula tp. Formula p is said to be satisfiable is there exists a 
(77-)sequence a satisfying p. Formula p is valid if <t ^ for every 77-sequence 
a. 

Let P be a JDS with observables O = II and p he & temporal formula over 
77. We say that p is valid over P (P-valid) if every observation of P satisfies p. 

We assume that all the temporal formulae are given in the positive normal 
form, i.e., with negation applied only to propositions. This can be easily achieved 
by pushing negation inwards, using the following equivalences: 

->->p =p -i{p V ’ip) = {-ip A -<ip) ->{pUip) =(-!(/?) V (-•'!/') 

-1 O = 0“"F ~'{p A 4>) = {->p \/ -<ip) ~^{pV f}) = {-'p)U{-<ilI) 

For an arbitrary formula p, we denote by pos{p) the positive form formula which 
is equivalent to p. We write p^q as an abbreviation of D{p -A q) . 

2.4 Tableaux and Their Corresponding jds’s 

Let p he & temporal logic formula presented in positive normal form. Let clo- 
sure{p) be the set of formulae including all the sub-formulae of p and the formu- 
lae OV' for every U and V sub-formula ip oi p. K p-atom A is a (not necessarily 
maximal) logically consistent subset of closure {p) that satisfies the following: 

1. If A contains a conjunction p A q it must contain both p and q. 

2. If A contains a disjunction p\/ q it must contain p or q. 

3. If A contains a formula ip = pUq, it must contain q, or both p and Q)ip. 

4. If A contains ip = pV q, it must contain q and either p or Q)ip. 

For an atom A, we denote by x(^) the conjunction of all formulae contained in 
A. An atom graph G,p = (.4, .4o,7f) for a formula p consists of a set A of p- 
atoms, a subset Ao C of initial atoms such that p G A for every A G Ao, and 
a set of edges 77 C ^ x .4 connecting atoms within the graph G. If {A, B) G 77, 
then it is required that p G B for every next-formula Q)p G A. 

The atom graph G = {A, Aq, 77) is said to be a tableau for p if for every A G A: 

X(^) and x(^) ^ \f Ox{B) 

AgAo (A,B)GE 

We refer to a proposition p G II or & negation of a proposition as a literal. For 
an atom A, we denote by prop{A) the conjunction of all literals contained in A. 

Let G = {A, Ao, 77) be an atom graph, where A = {Ai, . . . , An}. We define 
T>c ■ {V, O, O, p, J), the JDS corresponding to G, by: 

• V = n\J {k. : [l..n]}, i.e, V contains a control variable k, ranging over [l..n]. 
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• o = n. 

• 0 : \J (/t = z A prop{Ai)). 

AigAo 

• p '■ \J {k = i A k' = j A prop{AjY). 

{Ai,Aj)eE 

• J = {Jpu q I G p}- Thus, contains a justice requirement Jpu q for 
every until formula pUq contained in closure{(p). The justice requirement 
JpUq^s given by 

JpUq- = ^ ( V ('« = *)) 

I^Ai ptiq^Ai 

In the case that G is a tableau for the formula p, we say that T>g is ^ temporal 
tester for ip and denote it by Tp. 

The following theorem summarizes the properties of a temporal tester: 

Theorem 1. Let Tip he a temporal tester for the formula (p over the proposi- 
tional variables II. Then, a Il-sequence a is an observation of T^ iff a \= p. 
Furthermore, let a = so,si, ... be a computation ofT,p, let i > 0 be a position, 
and j = be the value of k in state Si. Then, u* ^ x{^j)- 

2.5 Model Checking 

Let P be a finite-state JDS whose observables are II . The goal of model checking 
is to establish the P-validity of a temporal property p. This is accomplished 
by constructing a JDS whose observations are all the P-observations that satisfy 
^p. If this JDS has no observations (equivalently, no computations), then p is 
P- valid. 

Let (f> = ->p, and let = T^p be a tester for (j). We define the JDS = 
P|||T 0 . The following theorem follows immediately from Claim 2.2 and Theorem 
1 : 

Theorem 2. The formula p is P-valid iffP^ is infeasible. 

Consequently, in order to verify that p is P-valid, the process of model check- 
ing involves constructing the JDS T>^ and checking that it is infeasible. 

3 A Well-Founded Approach to P-Validity 

Let p he a, temporal formula, P be a program, and (f = ->p. We keep p, 4>, 
and P fixed for the sequel. We describe how to obtain a deductive proof of the 
P-validity of a property p. 

The JDS can be viewed as a (labelled) directed graph = (S', S'o,r), 
where S is the set of states of Sq is the set of initial states, and r is the set 
of edges connecting states to their immediate successors. We assume that every 
node in S is reachable from an initial state. 
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A well-founded domain (W, >-) consists of a set W and a total ordering 
relation over W such that there are no infinitely decreasing chains ag >- 
oi .... A ranking function for T>^ is a function that maps the states of T>^ 
into a well-founded domain. Assume that the justice requirements of T>^ are 
{ Ji, . . . , Jr}- Rule WELL, presented in Fig. 1, can be used to prove that is 
infeasible. 



For a JDS Vf}, with justice requirements Ji, . . . , Jr, 
assertions Lpi, . . . ,g}r 

a well-founded domain {W, 
and a ranking function S: S ^ W 

W1.0 

W2. p A <pi ^ {<p'i A 5 ^ S') V A 5 y S') 

W3. p A ifii A J'i A Sy S') 



Uff is infeasible 



Fig. 1. Rule Well. 



Theorem 3. Well is sound. 

Proof: It suffices to show that given (pi,. . . , pr, (hV, )^), and 5 that satisfy the 
three premises W1-W3, is infeasible. Assume to the contrary that T>^ has 
a computation a\ sg, si, . . .. Since <t is a computation, sg ^ 0. By premise Wl, 
there exists some ig G {1, . . . ,r} such that sq \= ‘Pig - Denote dg = 6{sg) G W. 
By premises W2 and W3, there exists an ii G {!,..., r} such that Si \= pi^ 
and di = <j(si) ^ dg, where dg = d\ implies i\ = ig and Si^ ^ Jig. Proceeding 
in this way, we identify an infinite index sequence ig,i\,. . . and an infinite rank 
sequence dg ^ di ^ . such that for every j > 0, sj ^ pj, dj = S{sj), and 

dj = dj+i implies that ij = ij+\ and Sj+\ ^ Ji^. Since W is well-founded, the 
sequence of ranks cannot decrease infinitely many times. Thus, there exists some 
stabilizing position c > 0 such that for every j > c, dj = dc- It follows that for 
every j > c, ij = ic, and s^+i ^ Ji^. Thus, cr violates the justice requirement 
Ji^ and is therefore not a computation of □ 

We now describe how to automatically obtain the assertions, well-founded 
domain, and ranking functions: W = N and is the usual > ordering. Let 
be the dag obtained from after its separation to strongly connected 
components (SCCs). Each T^-node is either in some SCC of or singular, 
that is, not part of any SCC. For every SCC C, let UNJUSTc C {!,..., r} be 
the set of indices of the justice requirements that are not satisfied by any of 
C’s nodes. For every i = 1, . . . , r, let be a formula that describes the set of 
all singular nodes and nodes that belong to an SCC that violates Ji, that is, 
= {s : s is singular or s G C s.t. i G UNJUSTc}. 
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For all nodes s that are in T^’s leaf nodes, 5(s) = 0. If i5(s) is undefined, 
and 5 is defined for all T^-successors si, . . . , Sfc of s, then <5(s) = 5(sj) + l. 

Lemma 1 (Completeness). IfD^ is infeasible, then the procedure above pro- 
duces assertions, well-founded domain, and ranking that satisfy the premises of 
WELL. 

Proof: Since is infeasible, every SCC violates some Jj, and therefore any 
node that is in an SCC satisfies some ipi. All singular nodes trivially satisfy all 
(fiiS. Hence, every G^-state satisfies some (pi. From the construction it is easy 
to see that every state leads either to a state with the same ranking (in the 
same SCC), or to a node with a lower ranking (out of the SCC), thus W2 holds. 
Assume that s ^ pi, s' ^ J^, and (s, s') € p. Since s ^ pi, s belongs to an 
SCC neither of whose nodes satisfies Ji. Hence, s' and s are in different SCCs. 
Consequently, 5(s) > 5(s'), which establishes W3. □ 

4 From Falsification to Verification 

This section presents a procedure that exploits the information in G^ to generate 
deductive temporal proofs of i^’s P-validity. In particular, we present an algo- 
rithm that generates simultaneously two proof scripts, one of T>^’s infeasibility, 
and one of p’s P-validity. Thus, the first proof script establishes the falsification 
of (j), and the second establishes the verification of p. 

Let s e A be a state of and assume that the state variables of P are 

= {xi, . . . ,xe}. Let A(s) = Ai=i(^* = Note that the only -variable 
not included in is k. We denote by Ag the atom Aj where j = s[k] is the 
value s assigns to k. We define 

X{s) = A(s) A x(As) and x(s) = 

The formula A(s) is the characteristic formula of s. Its intended meaning is to 
describe the temporal formulae hold when a computation of reaches s. The 
formula x('S) describes the temporal formulae that hold when an initial run of 
an infeasible reach s. Note that X{s) does not refer to the state variable 
K. However, since xi^j) uniquely identifies k as having the value j, this is not 
necessary. 

The properties of the characteristic formulae and of the xs are stated in the 
following claim, whose proof follows immediately from the definitions and from 
Theorem 1: 

Lemma 2. For every computation a = sq, ■■ ■ of , and every i > Q, cr' \= 
X(si). Similarly, ifT>^ is infeasible, then for every run a = Sq,... of T>^ , and 
every z > 0, <t* |= x(si) 
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An immediate corollary of Lemma 2 is: 

Corollary 1. is infeasible iff X{sq) — >■ F for every Sq € S'o- Similarly, (p is 
P -valid ijf Xiso)^x(so) for every sq€ Sq. 

Based on Corollary 1, we describe a procedure that attempts to show that 
df(s)=i^F (for the first proof script) and that A(s)=i^x(s) (for the second proof 
script) for every state s G S'. If is P-valid, our procedure terminates after 
showing the above for every initial state sq G So, which, by Corollary 1, achieves 
our goal. If ip is not P-valid (and is feasible) our procedure terminates while 
failing to show the above for every sq € Sq. 

Both proofs are achieved by a chain of temporal formulae that are P-valid, 
and each either follows immediately from the properties of , or from the 
previous P-validates in the chain. Initially, the proof is empty. It then proceeds 
according to the procedure in Figure 2. 

The procedure in Figure 2 describes how both proof scripts are constructed. 
The procedure resembles the completeness proof of [5,6] , but, while the procedure 
there is purely semantic (working on the graph), here we give it a syntactic flavor. 



(a) For each program justice requirement J G , add to the proof the line 

□ Oj 

(b) For every state s G S, add to the proof the line 

T(s) ^ V.-6 .mOT(s') 

(c) Let all S’s nodes be “unmarked”. Repeat the following until all nodes are 
marked: 

(c.l) If there is an unmarked node all of whose successors are marked, 
then mark s and to the proof add the line 



X{s 

(c.2) If there is an SCC C of T>f 



X{s) ^ x{s) 

all of whose nodes are unmarked, 
such that all exits from C lead to marked nodes and for some 
justice requirement J a J no state C is a J-state, then for every 
node s G C mark s and add to the proof the line(s) 






\{s) ^ x(s) 



X{s) 

X{s) F 

(c.3) If no node was marked in either (c.l) or (c.2), report a counter- 
example and halt. 



Fig. 2. A Procedure to Generate a Temporal Proof of P-validity 



Theorem 4. If the procedure of Figure 2 terminates with all nodes marked , 
then for every initial node Sq, X(sq) and A(sq) =1^a('So)- Moreover, for every 
initial node Sq, the formulae generated by the first proof script constitute a proof 
that T(so) =1^F, and the formulae generated by the second proof script constitute 
a proof that A(sq) =1^x(so)- 

Proof Outline: Note first that the procedure marks all nodes only if there is 
no path in T>^ leading from an initial node to a SCC that satisfies every justice 
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requirement. Thus, the procedure terminates with all nodes marked only when 
is infeasible. 

The validity of steps (a) and (b) is immediate. In step (c.l) we are considering 
(and marking, if possible) a node that does not belong to any SCC from the 
graph. That is, we are removing state s which, at the time of removal, has all 
its successors marked. Assume that the successors of s in the original graph 
are si, . . . , Sfc. In step (b) we added to the proof the lines 

k 

X{s) ^ V OX{s,) (1) 

i=l 

Since all of the successors are marked, it follows that for each i = 1, ... ,k, the 
first and second proofs contains the lines A(si)=f^F and A(si)=^x(si) respec- 
tively. Combining these lines with formula (1), we conclude T(s) F and 
A(s) ^ x(s). 

In step (c.2) we are dealing with an SCC C, all of whose nodes are unmarked, 
and all of whose exit edges lead to marked nodes. Consider a state s £ C. 
Viewing again proof line (b) for state s, we observe that for every immediate 
successor s' of s which lies outside of C the proofs contain lines fF(s')=f^F and 
A(s') =f^x(s0 respectively. Consequently, we can reduce formula (1) into the 
formula X{s) ^ V(s s')st s'eC which in turn can be weakened into 

X{s'). Taking the disjunction of the above over all s G C, we 
obtain VsgC '^(®) O (VsgC from which, by the axioms of ltl, we can 

infer that for every s G C, 

T(s) ^ □(V-^(s)) (2) 

sGC 

There are two possible reasons why SCC C was marked. Either it failed to 
satisfy one of the program justice requirements J G or one of the atoms Ag 
for some s £ C contained the formula pUq and no state within C satisfies q. 

In the first case, the failure to satisfy J implies that X{s) -£ -•J for every 
s £ C. By formula (2) this implies A’(s)=^D-'J which, in view of the line 
□ OJ originally placed in the proof leads to X{s) =^F, however, since X{s) ££ 
A(s) A “'x(s)) it follows that A(s) =^x(s)- 

Consider the second case in which all atoms within C contain the formula 
pUq but do not contain the formula q. Let s be a node within C. Since C is 
strongly connected, s must have an immediate predecessor 's £ C. Let si, . . . , 
be all the nodes which are immediate successors of s such that q £ Ag^ for every 
i = 1, . . . , fc. From the construction it can be established that 

k 

X{s) A q^y X{s,) 



( 3 ) 
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Since no node within C contains q in its atom, it follows that si, . . . , are outside 
of C and, as C is currently marked, the proofs must contain lines X(si) and 
A(si) respectively, for every i = 1, . . . , fc. Together with formula (3), this 

implies X{s) A q^F. It follows that X{s)^{pUq A ~^q) for every s £ C. 

By formula (2) this implies X{s)^^{pUq) A which is equivalent to 

T(s) and to A(s) =i^x(s). □ 

5 Experimental Results 

We have constructed ProofProd, a prototype system that generates temporal 
proofs as described above. For example, we ran ProofProd on the following 
JDS P: V consists of L G {0, 1, 2} and p G {t, f}. We use Lj to denote L = j. 0 
is LO A p; The single justice requirement is J = LO V L2. The transition relation 
p is given by: (LO A LO' A p') V ((LO V LI) A LI' A -ip') V ((LI V L2) A L2' A p') . The 
property whose P- validity we’d like to establish is p = ODp- The atom graph 
for (j) = -i<p = □ 0“'P is described in Fig. 3. There, source-less edges point to 
initial states, and double circles denote states belonging to the (single) justice 
set J^. 




Fig. 4 contains the proof that ProofProd generated where we used an 
option that displays the proof in DTgX. We are currently working on heuristics to 
simplify temporal logic formulae that will allow to make the proof more compact. 
Fairly simple transformations can establish that all of Dp, Q ODp, and O^p 
imply (p = ODp, which helps simplify lines(6)-(9) of the proof in Fig. 4 to 

(6') (L2Ap) — >■ p (7') (LlA-'p) — >■ (pVp) (S') (LlA-'p) — >■ p (9') (LOAp) — >■ p 



(1) □O(L0 V L2) 

(2) (LO A p A A O0 O-ip A OO-'p) OC(L0 A p A <p A 0<p A 0->p A 

(LI A A </> A (30 A 0->p A C)0-'p) V (LI A ~<p A 0 A O0 0~'P A ~'p)) 

(3) (LI A A 0 A O0 A 0->p A C)0-'p) 0((L1 A A 0 A O0 0~'P A C)0-'p)\' 

(LI A A 0 A O0 A 0->p A ->p) V (L2 A p A 0 A O0 O-'P A 0^~'P)) 

(4) (LI A ->p A 0 A O0 O-'P A ->p) -0- OC(Ll A ->p A 0 A O0 0~'P A 00-'p)V 

(LI A ->p A 0 A O0 A 0->p A ->p) V (L2 A p A 0 A O0 O-'P A 00->p)) 

(5) (L2 A p A 0 A O0 O 0->p A 00->p) —>■ 0((L2 A p A 0 A O0 O 0->p A (30-'p)) 

(6) (L2 A p) ^ (cp V Qip V Dp V 0°P) 

(7) (LI A -ip) ^ (<p V (3p V Dp V p) 

(8) (LI A -ip) -0- {(pV Op V Dp V O^p) 

(9) (LO Ap) ^ {py Op V Dp V 0°p) 

Fig. 4. Proof script of P-validity of ( 0 > 
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Another set of heuristics is more specific to the type of formulae we obtain 
in the proof script. Consider the procedure of Figure 2. For every SCC C, let 
exit{C) = {s' : for some (s, s') £ t,s gC and s' ^ C} i.e., exit{C) is the set of 
all nodes the resider directly outside of C. It is easy to see that, if the procedure 
terminates when all nodes are marked, then for every SCC C that is marked in 
step 3.2 because if fails to satisfy a program justice requirement, 

A(s) (\/A(t) A x(t))W( V Xit)) (4) 

t^C t^exit{C) 

Similarly, for every SCC C that is marked in step 3.2 because if fails to satisfy 
a, pUq justice requirement, 

A(s) ^ (\/A(t) A x(t))W( V xit)) (5) 

t^C t^exit{C) 

In our example, since the SCC that consists of the single node whose A is 
L2 A p is removed because if fails to satisfy the justice requirement O’-'P, and 
its set of exit nodes is empty, we obtain from formula (5) that L2 A p (L2 A 
p A (p V p)) W F, which implies (by ltl theorems) and the simplification rules 
above that L2 A p n(L2 A p). Using a simplification rule “□(p A g) — >■ Dp”, 
we can obtain L2 Ap Dp instead of line (6) in the proof. 

These, and other heuristics we are working on, help to greatly simplify the 
proof scripts generated by ProofProd, making the proof methodology advo- 
cated here both practical and beneficial. 

6 Conclusion and Future Work 

The paper demonstrates how model-checking, that is considered useful only for 
purposes of falsification, can be used to obtain deductive verification. 

As reported in Section 5, we are currently working on heuristics to our sys- 
tem that will generate proofs that are closer to “human” proofs. We are also 
working on extending our results to apply to systems that employ a wider set 
of fairness constraints, as well as on obtaining deductive proofs from symbolic 
model checkers. 

Acknolwedgement We would like to thank Yi Fang for her technical assistance 
with ProofProd, and thank Jessie Xu and Yonit Kesten for careful proofread- 
ing of the manuscript. 
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Abstract. We say a polynomial P over 2 m strongly M-represents 
a Boolean function F if F{x) = P{x) (mod M) for all x G {0,1}". 
Similarly, P one-sidedly M-represents F if F{x) = 0 P{x) = 0 

(mod M) for all x G (0, 1}" . Lower bounds are obtained on the degree 
and the number of monomials of polynomials over 2 m , which strongly 
or one-sidedly M -represent the Boolean function deciding if a given n- 
bit integer is square-free. Similar lower bounds are also obtained for 
polynomials over the reals which provide a threshold representation of 
the above Boolean function. 



1 Introduction 

In this paper, we obtain lower bounds on the degree and the number of mono- 
mials of polynomials over 2 m > which strongly or one-sidedly M -represent the 
Boolean function deciding if a given n-bit integer is square-free. These results 
provide the first non-trivial lower bounds over 2 m on the complexity of a 
number theoretic problem which is closely related to the integer factorization 
problem. Similar lower bounds are also obtained for polynomials over the reals 
which provide a threshold representation of the above Boolean function. 

We also show that some simple number theoretic observations allow us to 
obtain quite strong lower bounds on several other complexity characteristics of 
testing if a given integer is square-free. 

We recall that an integer x is called square-free if there is no prime p such 
that p‘^\x. Otherwise, x is called square-full. We define the function 

',_/!) if a: is square-free, 

^ {0, if a: is square-full. 
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For a given integer n > 1 , we can identify x, 0 < x < 2" — 1, and its bit 
representation xi . . . x„ (if necessary we add several leading zeroes) and consider 
S(x) as a Boolean function of n variables. 

We say a polynomial P over strongly M-represents S if for all 1 < 

X < 2" - 1, 

P{xi,. . . ,Xn) = S{x) (mod M), (1) 

where x = x\ . . . Xn is the bit representation of x. 

Similarly, we say a polynomial P over one-sidedly M-represents S if 

for all 1 < X < 2” — 1 , 



P(xi, . . . , x„) = 0 (mod M) S'(x) = 0, 



(2) 



where x = Xi . . . x„ is the bit representation of x. 

For Boolean inputs we simply need to consider multilinear polynomials. Each 
polynomial over ~Km is of the form 

P(xi, . . . ,x„) = X] n 

HgH i&H 

where 

■H C 2{i’2. ..n} and 0^Ah€12.m- 

We call the largest value of \H\ in the representation (3) the degree of P 
and write degP. We call the number of coefficients Ajj , or equivalently \TL\, 
the sparsity of P and write spr P . 

In this paper, we obtain lower bounds on the degree deg P and the sparsity 
sprP of polynomials over Zm, satisfying either (1) or (2) for all inputs. 

Similarly to the case of polynomials over 2 m , for a polynomial f in n 
variables over the reals IR , we define the total degree deg / as the largest sum 
+ . . . + and the sparsity spr / as the number of coefficients in the 

representation 

y (xi , . . . , Xn) — ^ ^ ■ - ,*n ^ 



For a real w we define the sign-function as 



sign w = 



1, if re > 0, 

0, if ic < 0. 



Here we also obtain lower bounds on the degree deg / and sparsity spr / of 
polynomials / providing a threshold representation of 5'(x) for n-bit integers 
X, that is a representation of the form 



sign/(xi, . . . ,x„) = S'(x), 



where x = Xi . . . x„ is the bit representation ofx, l<x<2” — 1. 

Furthermore, in the case of real polynomials, the Boolean values 0 and 1 can 
be interpreted as two arbitrary real values oq and , not necessarily oq = 0 and 
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= 1 . It is easy to see that the degree of the corresponding polynomials does 
not depend on the particular choice of uq, a\ because they are equivalent under 
a linear transformation of variables [19]. But it is shown in [19] that the sparsity 
spr/ depends on the choice of «o Etnd ai . In fact, there are examples of Boolean 
functions demonstrating that for (ao,ai) = (0,1) and (ao,ai) = (1,-1) the 
gap between the numbers of monomials of the corresponding polynomials for 
these two representations can be exponentially large [19]. 

Threshold representations of Boolean functions via real polynomials have 
been studied in a number of works [8,9,14,19,24,28]. These papers contain many 
general estimates together with lower bounds for some particular Boolean func- 
tions. However, these Boolean functions are usually specially constructed exam- 
ples which are not related to any particular number theoretic or combinatorial 
problem. 

Representations of Boolean functions via polynomials over 2 m have been 
studied in [2,3,15,30]. In these papers, lower and upper bounds are obtained for 
polynomials representing the OR, MOD m (that determines if the sum of the 
inputs is not divisible by M), and -■MODm Boolean functions. We note that 
a polynomial of degree d over 2 m is represented by a circuit consisting of an 
unbounded fan-in MOD m gate at the top where each input wire is a function 
of no more than d variables. In [12,29], some lower bounds are obtained for 
polynomials over 22 strongly 2 -representing the Boolean function deciding the 
quadratic residuacity of an n-bit integer x. 

In the series of papers [4, 5, 6, 7] lower bounds have been obtained on the 
circuit complexity, sensitivity, degree of polynomial representation and other 
complexity characteristics of testing square-free numbers and computing the 
greatest common divisor. As in [12,29] the method of [4,5, 6, 7] is based on the 
uniformity of distribution of long patterns of 0,1 in the values of S{x) . For the 
quadratic residuacity a similar property has been established in [12,29] by using 
the very powerful Weil estimate, in [4, 5, 6, 7] a sieve method has been used for 
this purpose. In particular, for a strongly 2 -representing polynomial P the lower 
bound 

degP > 0.165 . . . n 

has been obtained in [5]. It has also been applied to obtain a lower bound of 
order on the degree of real polynomials P which approximate S in the 

following sense: for all 1 < a; < 2" — 1 , 



|5'(x) - P{xi , . . . ,x„)| < 1/3 

where x = x\ ... Xn is the bit representation of x. These lower bounds are de- 
rived from the asymptotic formula for the sensitivity of the function S obtained 
in [5]. Unfortunately, there is no link between the sensitivity and the degrees of 
M-representing polynomials, M > 3, and of threshold representations. 

Alternative methods of [1] and [32] yield stronger but less explicit complexity 
results (which apply to primality testing as well) . However these approaches work 
neither for M-representing polynomials nor for threshold representations. 
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Here we use the technique of [4, 5, 6, 7] to obtain several new results about 
polynomial representation of the function S{x) . 

Throughout the paper we denote by log x the binary logarithm of a; , by In a: 
the natural logarithm of x, and exp(a;) = e^. 



2 Auxiliary Results 



Let V denote the set of primes. 

We use the following well known asymptotic formulas (see [13] for example) 



In 



Hr =* + o(i^)- 



p<a; 

ype-p 



and 



7r(a^) = 

In a; 



O 




X — >■ oo, 



for the number of primes p < x . 

The following estimate can be found in [20], Section 10.11. 



(4) 

(5) 



Lemma 1. For any integers L and N with 0 < L < N/2 the hound 



< 2H(l/n)n 

x=o 

holds, where 77(7) = “T^ogy — (1 — 7)log(l — 7), 0 < 7 < 1, is the binary 

entropy function. 

Now we prove the following quite technical statement. 

Lemma 2. Let m > 1 be an integer and let us define k from the inequalities 



Let m < Pi < . . . < Pm be the first m primes which are greater than m . Then, 
for any m-dimensional binary vector (cti, . . . , am) exists an integer y , such that 
0 < 2/ < exp (4m In m + 0(m In In m)) and 

S{2’^y + Pt) = ai, i = l,...,m. 

Proof. Put 

(5 = p and R = 2^Q. 

p<m 

pep 

From (4) we see that Q = exp(0(m)). Thus it is enough to show that there 
exists an integer u such that 0 < at < exp (4mlnm + 0(m In In m)) and 



S{Ru + pi) = Ui, 



i = 1 , . . . , m. 



( 6 ) 




On Polynomial Representations of Boolean Functions 309 



We remark that gcd(pi, R) = 1, i = 1, . . . ,m. 

Let 2 be the set of subscripts i for which <7^ = 0 and let J be the set of 
subscripts j for which aj = 1. Put 

q = Y[pl 

iex 

From the Chinese Remainder Theorem we conclude that there exists an integer 
a, 0<a<(7— 1, such that Ra = —pi (modpf), for all i G 2. Therefore, 
R{qz + a) +pi = 0 (mod pf) , for all z G I and any integer z. Now we show that 
one can select a not too large z for which 



S {R{qz + a) + pj) = 1, j&J- 

For Z > I, we denote by Lj(Z) the number of square-full numbers of the 
form R{qz + a) + pj with l<z<Z,jGj^. To prove the lemma it is sufficient 
to show that for some appropriate Z, 



J2Lj(Z)<Z. (7) 



First of all, we remark that, for z G I and j G 77, 

R{qz + a) + Pj ^ Q (mod pi). 

Otherwise, we have p1\{pj — Pi) which is impossible. 

For any prime p G V with gcd(p, g) = 1 , the congruence 

R{qz + a) + Pj = 0 (mod p'^), ^ < z < Z, 

has at most Z/p"^ + 1 solutions. Obviously, it does not have solutions for > 
Rq{Z +1)+R. Put V = (3RqZ)^/^ . 

The smallest prime divisor of any number R(qz + a) +pj exceeds m. There- 
fore, 



m<p< V 
gcd(p,q) = l 



Lj{z) < 



p>m 



V 

In V 






i/>[logmJ 2*^+1 >p>2*^ 

7t(2'^+1) 



V 

In V 



E 



i/> [log m\ 



221 - 



O 



V 

In V 






i/> [log mj 



2"iy In V 



= O 



V 



ilnm In P 



Putting Z = m?Rq we obtain the inequality (7), provided that m is large 
enough. Therefore, there exists an integer u satisfying condition (6) and 0 < 
u < q{Z -I- 1) < 2rrkRq^ . 
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Now, from (5) we conclude that Pm = mlnm+ 0(m) . Therefore, we have 
q < exp (2m In m + 0(m In In m)) . Finally, from (4) we see that R = exp (0(m )) , 
and the result follows. □ 

The result of Lemma 2 can be improved by means of some more sophisticated 
sieve methods, see [17] for example. However, this does not improve our main 
results. 

3 Main Results 

First of all we consider deciding the property of being square-free via polynomials 
in 2 J)^/[ , . . . , . 

Theorem 1. Assume that a polynomial 

p(Xi,...,x„) e Zm[Xi,...,x„] 

strongly M -represents S(x) , that is, it is such that for any x , l<x<2” — 1, 
P(xi , . . . , Xn) = S(x) (mod M), 

where x = Xi ... x„ is the hit representation of x . Then, for sufficiently large 
n, the hounds 

Ti 

degP>0.141nn and sprP>— 

5 In n 

hold. 

Proof. Assuming that n is large enough, we put 

■ n ■ 

= FI • 

5 Inn 

Let pi, . . . ,Pm and k be defined as in Lemma 2. 

We denote by r the number of monomials pj(w), j = l,...,r, in w = 
(wi , . . . , Wk ) , such that for every fc-dimensional vector 

w = (wi, . . .,Wk) G {0, 1}'^ 

we have a representation of the form 

r 

p(Ti, . . . , ^ Tj{w)fj{Y„ . . . , 

i=i 

with some polynomials fj(Yi, ..., G ^m[Yi,..., T„_fc] . 

Obviously, 




and T < spr P. 



(8) 
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As in the proof of Lemma 2, we note that pi < . . . < Pm < rri? < 2^ . For 
every i = 1, . . . ,m, we add several leading zeroes to the binary representation 
of Pi to obtain binary strings Si of length k . 

If T < TO , then there exist to integer coefficients Ci , . . . , , not all equal to 

zero, with 

m 

Therefore we have the identity: 

m 

T CiP{Xi, . . . , Xn-k, Si) = 0. 
i=l 

Without loss of generality we can also assume that 

gcd(ci,...,Cm) = 1- 

Then, for some 1 < zq < to we have Cig ^ 0 (mod M) . 

One easily verifies that = exp {5mlnm + 0{m)) . Hence, from Lemma 2 
we derive that there exists y, 0 < y < 2"“^, such that for i = 1, . . . ,to, 

P(yi,...,y„_fc,s,) = 5(2'=y + p,) = |j’ (mod M) 

where y = y\ . . . yn-k is the bit representation of y (with several leading zeroes, 
if necessary, to make it of length n — k). Hence, 

m 

y]ciP(j/i, . . yn-k, Si) = Cig ^0 (mod M). 
i=l 

From the obtained contradiction we see that t > m > 2^^“^^/^. Taking into 
account that iL(O.l) < 1/2 and 0.1/ln2 > 0.14, from the inequalities (8) and 
Lemma 1 we obtain the desired result. □ 

Theorem 2. Let M = p'^ he a prime power. Assume that a polynomial 
P{Xi,...,Xn) G Zm[Ai,...,A„] 

one-sidedly M-represents S (x) , that is, it is such that for any x , l<x<2" — 1, 

P(a;i, . . . , x„) = 0 (mod M) <1=^ S'(a;) = 0, 

where x = xi ... Xn is the hit representation of x . Then, for sufficiently large 
n, the hounds 

Ti 

degP>0.141nn and sprP>— 

5 In n 



hold. 
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Proof. As in the proof of Theorem 1 we obtain that, for some 1 < io in, and 
some u^O (mod M) , 

P(t/i,...,y„_fc,Si) = 1“’ (modM). 

Also Cig ^ 0 (mod p), and hence, gcd(cig,M) = 1. Therefore, 

m 

^CjP(j/i,...,y„_fc,Si) = 0 (mod M), 

i-l 

and as in the proof of Theorem 1 we obtain the desired result. □ 

Now we consider deciding if a given n-bit integer is square-free via real 
polynomials. 

Theorem 3. Let oq ; cri be two distinct real numbers, and n > 1 be an integer. 
Suppose that a polynomial 

/(Xi,...,A„) gIR[Ai,...,A„] 

is such that for any x , l<x<2” — 1, 

sign/(a,,,,...,ag,„) = S{x), 

where x = x\ . . .Xn is the bit representation of x . Then, for sufficiently large 
n, the bounds 



deg/>0.141nn and spr/>— 

Sinn 

hold. 

Proof. We proceed as in the proof of Theorem 2. Assuming that n is large 
enough, we put 

r ' 

— • 

5 Inn 

Let Pi, . . . ,Pm and k be defined as in Lemma 2. 

We denote by r the number of monomials pj{w) , j = l,...,r, in w = 
(wi, . . . , Wk) , such that for every fc-dimensional vector 

w= {wi,..., Wk) G {«o, ai}^ 

we have a representation of the form 

r 

f{Yi, ..., Yn-k, w) ..., Yn-k) 

i=i 

with some polynomials fj{Yi, ..., Yn-k) G lR[Fi, . . . , F„_fc] . 
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Obviously, 



T < 



/deg / + k\ 

V deg/ J 



and r < spr /. 



(9) 



As in the proof of Lemma 2, we note that pi < . . . < Pm < rri? < 2^ . For 
every i = 1, . . . ,m, we add several leading zeroes to the binary representation 
of Pi to obtain a binary string of length k . In this string we replace 0 by ao 
and 1 by cti and denote by Si G {ao,ai}^ this new vector. 

If T < m , then there exist m real coefficients Cj , i = 1, . . . ,m, not all equal 
to zero, at least one of them negative, and such that 



m 

J^C,Pj(Si) = 0, J = 1,...,T. 



Therefore, we have the identity: 

m 

^ ^ C-if (Ail j • ■ ■ 5 ^n—ki — 9- 
i=l 



One can easily verify that 

2 «-fc _ In TO + 0{m)) . 

Hence, from Lemma 2 we derive that there exists y, 0 < y < 2”“^, such that: 



Cif{ay^ , Si) > 0, for every a < 0, 

Cif{ay^ , Si) > 0, for every a > 0, 

where y = y\ . . . yn-k is the bit representation of y (with several leading zeroes, 
if necessary, to make it of length n — k). Thus, 

m 

'y ' Ci/(Oyj , • • ■ , C(y„-k ) Si) >0. 

i=l 

From the obtained contradiction we see that t > m > and as in the 

proof of Theorem 1 we obtain the desired result. □ 



4 Remarks 

It is not hard to see that the constants in our estimates can be improved. 

On the other hand, we do not know how to obtain more substantial improve- 
ments of our lower bounds. In particular, they are exponentially weaker than 
those which are known for polynomials over 2 2 , see [5] . 

In addition, it would be very interesting to obtain analogues of the results 
of this paper for other Boolean functions related to various number theoretic 
problems. For example, for Boolean functions deciding primality or the parity 
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of the number of prime divisors of x . Unfortunately, even more advanced sieve 
techniques than those used in Lemma 2 are still not powerful enough to produce 
such results, even under the assumption of the Extended Riemann Hypothesis. 

Finally, it would be very interesting to extend Theorem 2 to arbitrary com- 
posite moduli M . 

Several more lower bounds on some other important complexity characteris- 
tics can be obtained from quite simple considerations. 

Let us define the additive complexity C± (/) of a polynomial / over reals 
as the smallest number of ‘ -I- ’ and ‘ — ’ signs necessary to write down a polyno- 
mial [11,16,18,26,27]. Obviously, for any univariate polynomial / 

C± (/) < spr (/)-!< deg / 



but neither spr (/) nor deg / can be estimated in terms of C± (/) . However, it 
is shown in [18,26,27] that if a non-zero polynomial f{X) G 1R[X] has at least 
N real zeroes, then 



C± (/) > 




The notion of additive complexity is related to the straight-line complexity of 
/, see [11,16,18,26,27] 

Now, let f{x) G lR(a;) be such that 



sign/(a;) = S{x), 0 < a; < 2” — 1. 



If 4x -|- 1 is a square-full number and p > 1 is a prime number such that 
p'^\{4:X + 1), then = 1 (mod 4) and 4x -I- 1 = (4g -|- l)p^ for some positive 
integer q. For a fixed prime p, there are at most jp^ -I- 1 integers q that 
satisfy the above condition. Hence, the number of square-full numbers of the 
form 4x -|- 1 is bounded above by 

3<p<(2"-l)i/2 ' 

It follows then, that there is a constant c > 0 such that there are at least c2” 
square-free numbers of the form 4x-|- 1 and, thus, f{4x)f{4:X + 1) < 0 for them. 
Therefore, f{x) has at least c2" zeroes. This immediately provides the same 
bound on the degree of / and the lower bound 

C± (/)>(0.2n)'/" + O(l). 

Following [22], for a function 






we define the M f {n) -invariant as the smallest integer M, such that for any 
X < M there are two n-bit integers 0 < xi < a ;2 < 2” — 1, both divisible 
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by A, and such that f{xi) ^ f{x 2 )', see also [10,21,22,23] for applications to 
complexity theory. 

It is easy to show that, for any integer A, there exists u < such that 
Am + 1 is square-full, where p is the smallest prime number with gcd(A,p) = 1. 
Thus p = 0(log(A-|- 1)). It has been shown in [17] that, for any £ > 0, there 
exists a square-free number of the form Am -I- 1 with v = 0(A^/®“'"^), where the 
implied constant depends only on e. 

Therefore, if f{x) = S{x + 1) for 0 < x < 2” — 1, then, for any e > 0 the 
bound 

Mf > C(£)2®"/13-e 

holds where C{e) > 0 depends only on e. 
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Abstract. The range allocation problem was recently introduced as part 
of an efficient decision procedure for deciding satisfiability of equivalence 
logic formulas with or without uninterpreted functions. These type of for- 
mulas are mainly used when proving equivalence or refinement between 
systems (hardware designs, compiler’s translation, etc). The problem is 
to find in polynomial time a small finite domain for each of the variables 
in an equality formula tp, such that ip is valid if and only if it is valid 
over this small domain. The heuristic that was presented for finding small 
domains was static, i.e. it finds a small set of integer constants for each 
variable. In this paper we show new, more flexible range allocation meth- 
ods. We also show the limitations of these and other related approaches 
by proving a lower bound on the size of the state space generated by 
such procedures. To prove this lower bound we reduce the question to a 
graph theoretic counting question, which we believe to be of independent 
interest. 



1 Introduction 

The range allocation problem was introduced in [PRSS98] as part of an efficient 
decision procedure for equivalence logic formulas with or without uninterpreted 
functions. These type of formulas are mainly used when proving equivalence or 
refinement (abstraction) between systems. Deciding satisfiability (and validity) 
of formulas with uninterpreted functions is of major importance due to their 
broad use in abstraction. We refer the reader to [BD94] and [PSS98], where 
these type of formulas are used for proving equivalence between hardware designs 
(former) and for translation validation, a process in which the correctness of a 
compiler’s translation is proven by checking the equivalence of the source and 
target codes (latter). 

In the past few years several different BDD-base procedures for checking sat- 
isfiability of such formulas have been suggested, (in contrast to earlier decision 
procedures that are based on computing congruence closure [BDL96] in com- 
bination with case splitting). Typically the first step of these procedures is the 
translation of the original formula tp to a, function-free formula in equivalence 
logic ip such that ip is satisfiable iff (p is. Then, a procedure for checking satisfi- 
ability of E-formulas is used for deciding ip. This second procedure is the focus 
of this paper. 
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Goel et al. suggest in [GSZAS98] to replace all comparisons in ip with new 
Boolean variables, and thus create a new Boolean formula ip' . The BDD of ip' is 
calculated ignoring the transitivity constraints of comparisons. They then tra- 
verse the BDD, searching for a satisfying assignment that will also satisfy these 
constraints. Bryant et al. at [BVOO] suggested to avoid this potentially exponen- 
tial traversing algorithm by explicitly computing a small set of constraints that 
are sufficient for preserving the transitivity constraints of equality. By checking 
Ip' conjuncted with these constraints using a regular BDD package they were 
able to verify larger designs. 

The method which we will present here, extends the method first presented 
in [PRSS98], where ip^s satisfiability is decided by allocating a small domain for 
each variable, such that ip is satisfiable if and only if it is satisfiable over this 
small domain. To find this domain, the equalities in the formula are represented 
as a graph, where the nodes are the variables and the edges are the equalities 
and disequalities {disequality standing for yf) in ip. Given this graph, a heuristic 
called range allocation is used in order to compute a small set of values for 
each variable. To complete the process, a standard BDD based tool is used 
to check satisfiability of the formula over the computed domain. In [RSOl] we 
elaborate on this by generating a smaller graph than [PRSS98] . This is achieved 
by examining the original formula with uninterpreted function Lp, instead of its 
translated version ip. 

In this paper, we extend the second part of [PRSS98], by suggesting a more 
general method of allocating finite domains to variables. Using the information 
in the graph generated by [PRSS98] or [RSOl], we suggest and analyze different 
procedures for generating a small state space that is adequate for checking ip. 
One of our main results is a general lower bound on the size of the state space 
generated by any method using only the information in this graph. The suggests 
the need for a more in depth investigation of the formula at hand, rather than 
only examining which atomic equalities appear in it. 



2 Equivalence Logic Formulas 

An equivalence logic formula (called an E-formula) has the following syntax: 

(Formula) < — (Term) = (Term)\ -'(Formula) \ (Formula) V (Formula) 

(Term) i — (Variable) \ lT'E((Formula) , (Term) , (Term)) 

ITE(/,t,e) stands for if / then t else e. The E-formula ip is said to be satisfiable 
if there is some assignment of values to p’s variables that satisfies p. Therefore, 
an E-formula p with variables U is a function p : — >■ {0, 1}. However, not 

all such functions can be realized as E-formulas, for example p{a^ h) = a > b. 
Therefore, we will try to make a more accurate definition. 

Definition 1. (partition): Given a set V, we say that a = {ai, . . . , ak} is a 
partition of V if: 
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1 . R = U a2 U . . . U 

2. for all i ^ j we have that ai fl = 0 

Given a partition a = {a\, . . . , a^} we denote u v if there is G a such 
that u,v G ai- In other words, a partition a gives us an equivalence relation 
~Q,. In fact, there is a one-to-one correspondence between the set of equivalence 
relations on V and its set of partitions. 

Given an assignment to the variables of V, a : V — >■ IM, we denote by 
partition{a) the partition of V that satisfies a(v) = a(u) <1=^ v ^partition(a) u. 
The following is immediate from the fact that E-formulas only query comparisons 
between their input variables: 

Claim. For E-formula p, if assignments a, b satisfy partition{a) = partition{b) 
then p{a) = p{b). 

This means that E-formula p on variables V can be viewed as a function of 
the partitions of V instead of as a function of assignments to V. Denote by 
P(V) the set of partitions of the set V. p can therefore be viewed as a function 
p : P{V) — >■ {0, 1}. In fact, any function from P{V) to {0, 1} can be realized as 
an E- formula with variable set V . 

It is easy to verify that letting each variable in an equivalence formula range 
over {1, . . . , |E|} suffices for checking the formulas satisfiability. This shows that 
deciding satisfiability of equivalence formulas is in NP, and therefore is clearly 
NP-complete (proved via a trivial reduction from satisfiability of boolean for- 
mulas). 

3 Equivalence Graphs 

Definition 2. (E-graph): An E-graph Q is a triple Q = (V,EQ,DQ) where V 
is the set of vertices, and EQ (Equality edges) and DQ (disequality edges) are 
sets of unordered pairs from V. 

We will use Q and Ti to denote E-graphs, and G and H for standard graphs. For 
E-graph Q = {V, EQ, DQ) we denote V(Q) = V, EQ{Q) = EQ and DQ{Q) = 
DQ. We denote by Q= the graph on vertices V (Q) and edges EQ{Q). We use < to 
denote the subgraph relation: V. < Q if EQQ-L) C EQ{Q) and DQ((H) C DQ{Q). 

We say that a partition a satisfies equality edge (u, v) if u v, and that 
it satisfies inequality edge (u, v) if u v- We say that a partition a satisfies 
E-graph Q (denoted a \= G) if it satisfies all of Q's edges. An E-graph is said to 
be satisfiable if there exists a partition that satisfies it. 

Lemma 1. An E-graph G is satisfiable iff for every (u,v) G DQ{G), u and v 
are not connected in G=. 

The algorithms of [PRSS98,RS01] construct for a given E-formula p, an E-graph 
G, satisfying the following property: 

Definition 3. (adequacy of E-graphs for E-formulas): The E-graph G is ade- 
quate for the E-formula p if either p is not satisfiable, or there exists a satisfiable 
H < G such that any partition a\=TL satisfies p{a) = 1. 
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Hence, if we want to check whether ip is satisfiable, we only need to check p on 
a relatively small set of partitions: 

Definition 4. (adequacy of partition sets for E-graphs): For an E-graph Q and 
a set of partitions R C P[V{Q)), we say that R is adequate for Q iff for every 
satisfiable H < G, there is a € R such that a \= R. 

This leads to the following claim: 

Claim. If the partition set R is adequate for the E-graph G and G is adequate for 
the E-formula p, then p is satisfiable iff there is some a G R such that p(a) = 1. 

To use this claim, we need to devise a procedure that, given an E-graph G, will 
find a small partition set R which is adequate for G- 

4 Static Range Allocation 

The method of [PRSS98] generates for an E-formula p, an E-graph G which is 
adequate for p, and then uses it to find for each variable v of p, a, small range of 
natural numbers D{v). It is then proved that p is satisfiable iff it is satisfiable 
where each variable v is allowed to range only over D{v) (as opposed to IN). 

Therefore, we check p over the assignment set Ajy = {a | Vv,a(u) G D{v)}. 
The corresponding partition set is Pd = {partition(a) \ a G Ad}, and the con- 
struction of [PRSS98] guarantees that Pd is adequate for G- In general, we will 
say that assignment set A is adequate for an E-graph G if its corresponding 
partition set is adequate for G- 

All methods proposed so far (including this paper) , generate a new Boolean 
formula from the original E-formula p. For example, the static range allocation 
method replaces every variable of by a variable ranging over a finite domain. 
This resulting formula is then translated to a Boolean formula using standard 
methods. 

We will therefore measure the complexity of our proposed methods in terms 
of the number of Boolean variables in the newly generated Boolean formula 
(or equivalently, the size of the state space checked, which is 2 to the power of 
the number of Boolean variables). For example, in static range allocation, the 
state space size will be Hi, l-C*(^)l) i-®-, the number of boolean variables needed 
is log(l^(^)D- This complexity measure is not always related to the true 
complexity of checking the resulting formula, but it is usually a good indicator 
for it. 



5 Dynamic Range Allocation 

We propose the new method of dynamic range allocation which improves upon 
the static range allocation method. Both experimental (Section 10) and the- 
oretical (Section 9) results show the advantages of dynamic over static range 
allocation. 
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Definition 5. (dynamic assignment): The mapping x : V — >■ INUR is o dynamic 
assignment if it is acyclic; i.e., there exists an ordering ofV, vi, . . . ,v„ such that, 
x(vi) = Vj implies j > i. 

Definition 6. (induced assignment): For the dynamic assignment x, we define 
the induced (static) assignment a; : R — >■ IN; 

1. If x{v) G IN then x(v) = x(v) 

2. If x{v) = u €V then x{v) = x{u). 

Note that this is a recursive definition, and that it is well defined since we 
required that dynamic assignments be acyclic. 

Example 1. The dynamic assignment: x{v\) = V2,x(v2) = V3 and x(v3) = 2, 
induces the static assignment x: x(vi) = 2,x(v2) = 2,x(vs) = 2. 

For E-graph Q and dynamic assignment x we denote x \= 0 if x \= G- 

Definition 7. (dynamic range): A dynamic range for vertex set V is a function 
D : V ^ 2^^^^ that is acyclic; i.e. there exists an ordering ofV, vi,. . . ,Vk such 
that, Vj G x{vi) implies j > i. 

A dynamic range D gives rise to Xd = {x | Vu G V,x{v) G D{v)}, a set of 
dynamic assignments, which in turn induces a set of static assignments Xjy = 
{x I X G Xd}. We will therefore say that D is adequate for the E-graph Q if 
the above assignment set is adequate for Q. 

One advantage of dynamic range allocation is that given an E-formula tp, 
and an adequate dynamic range D for ip, we can efficiently generate a Boolean 
formula that is satisfiable iff ip is, with little increase in the size of the formula. 

W.l.o.g., assume that the variable ordering satisfying Definition 7 is vi , . . . , 
and therefore, fore every i, D{vi) C IN U {t>i -I- 1, . . . , v„}. We encode the range 
D{vi) by a variable Ci whose domain is: 

D{ci) = (IN n D{vi)) U {-j I Vj G D{v,) } 

That is, we include in D{ci) all the integers which are in D(vi) and, for each 
Vj G D{vi), we include — j in D{ci). 

Then, we can derive a formula x which is satisfiability equivalent to ip but 
depends only on the variables ci, . . . , c„. If these variables are then finite, i.e., 
all D{vi) are finite, then y can be easily translated to a Boolean formula. We let 

Xi = let (xi = if (ci > 0) then ci else v-d) in ip 

We then derive X 2 , Xa, ■ • ■ , An successively, where for each i > 1: 

Xi = let {vi = if (ci > 0) then Ci else U-cJ in Xi-i 

Finally, we let x = Xn- Note that since the dynamic range is acyclic, the resulting 
formula (with term-sharing) is also acyclic. 

Claim. If D is adequate for G which is adequate for ip, the resulting formula x 
is satisfiable iff ip is. 
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We now wish to construct a procedure that, given an E-graph Q, will construct 
a small adequate dynamic range for it. Notice that like in static ranges, the size 
of the state space when using a dynamic range D is Ht, 

We will use the following notation for an E-graph Q and a vertex v € V{Q): 
rsQ^v) = {u I (u,v) G EQ{G)}, and Fogiv) = {m | (u, w) G DQ{Q)}. 

We now define the E-graph G[v], which results from removing the vertex v 
from Q and transforming v’s equality and disequality constraints to constraints 
on its neighboring vertices: 

1. The vertex set is V(G) \ {?;}. 

2. Initially, f/[u]’s edges are all the edges of G which are not adjacent to v. 

3. For Ml yf U 2 and ui,U 2 G FEgiv), add an equality edge (ui,U 2 ). 

4. For Ml yf U 2 , u\ G Peq^v) and U 2 G Peq^v). add a disequality edge (mi,M 2 )- 

Example 2. In Figure I, G\ = I/o[a], G 2 = Gi[b], G 3 = G 2 [c], etc. 

The following theorem is our building block for the construction of procedures 
that calculate an adequate dynamic range for a given E-graph G (see Appendix 
A.l for its proof): 

Theorem 1. For E-graph G and u G V(G), if the dynamic range D is adequate 
for G[u], then D' is adequate for G, where D' is defined as follows: 

F D'{v) = D{v) for every v ^ u. 

2. If Feq{u) = 0 and Feq{u) yf 0 then D'(u) = Feq{u). Otherwise, D'{u) = 
Feq(u) U {unique}. 

Where unique G IN and unique ^ UvD{v). 

Based on Theorem 1, the following procedure finds an adequate dynamic assign- 
ment set for a given E-graph G'- 

1. Set counter ^ 0. 

2. Pick some vertex v of V{G)- 

3. Set: 

{ {counter} Feq{v) = 0 

Feq{v) U {counter} rEQ{.v) yf 0 and Fdq{v) yf 0 

rEQ{v) Feq{v) yf 0 and Feq{v) = 0 

4. Set G = G[v\. 

5. Set counter <— counter 1. 

6. If V(G) yf 0 return to Step 2. 

Notice that counter is only used to generate unique numbers. 

Example 3. Using this procedure we can generate an adequate dynamic range 
for the E-graph Go of Figure 1: 

1. Set D{a) = {c} and calculate Gi = Go[o]- 

2. Set D{b) = {0,d,e}, and G 2 = Gi[b]- 

3. Set D{c) = {l,d,e}, and Gs = f/i[c]. 

4. Set D{d) = {2,e}, and Ga = Gsld]- 
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Fig. 1. Dynamic range allocation is based on an incremental process in which vertices 
are removed one by one, and their constraints are reallocated to their neighbors. The 
dashed lines in graphs So, ■■■ ,04 represent equality edges while solid lines represent 
disequality edges. 



5. Set D(e) = {3}. 

The resulting state space in our example was of size 18. If we would have taken 
a different order on the vertices when using our procedure: a, d, b, c, e, the re- 
sulting state space would be of size 12. In our implementation we use a simple 
greedy heuristic that in Step 2 chooses the vertex which will be allocated the 
smallest domain in Step 3. This heuristic generates a state space of 12 on the 
above example. In Section 10 we compare this procedure with the static range 
allocation procedure of [PRSS98]. 

The sequence of E-graphs generated by our procedure is almost the same 
as that generated by the procedure suggested by [BVOO], except they don’t 
distinguish between equality and disequality edges, and therefore the vertices in 
our graphs should generally have a smaller degree. In their procedure, when a 
vertex is removed from the graph, the number of Boolean variables added to the 
formula is equal to its degree, and therefore the number of Boolean variables 
appearing in their resulting formula is degree{v), where degree{v) is the 
degree of v in the graph at the time of its removal. In contrast, in our procedure, 
when a vertex Vi is removed, a variable Ci ranging over degree{v) + 1 values is 
added to the formula, and so we get log{degree{v) + 1) Boolean variables. 
Therefore, we need much less Boolean variables than their method. However, 
we note that the Boolean formula they generate is very different than ours, and 
in spite of the increased number of boolean variables, may actually be easier to 
check. 

6 One-Orientable Assignment Sets 

In this section we present an alternative method for finding a partition set that 
is adequate for a given E-graph. This method generates a partition set of a dif- 
ferent kind, with a more complex representation. Although the resulting Boolean 
formula has a relatively small number of Boolean variables in it, this method is 
not practical since this Boolean formula is much larger and more complex than 
the original one. This renders this method impractical, yet still interesting since 
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we can show that on a large set of E-graphs the number of Boolean variables is 
slightly more than twice the minimal number needed. 

Definition 8. (partition of a graph): A graph G defines a partition: 
ao = {W C V{G) \W is a connected component of G} 

Definition 9. (one-orientable) : A graph is one-orientable if its edges can he 
directed in such a way that the out-degree of every vertex < 1. 

Definition 10. (one-orientable partition set): The one-orientable partition set 
of an E-graph Q is One{Q) = {an \ graph H is one-orientable and H < Q^} 

Proposition 1. One{Q) is adequate for Q. 

Example 4- Figure 2 presents the graph of some E-graph Q. The sub-graphs 
1, . . . ,5 describe some of its one-orientable sub-graphs, and therefore their cor- 
responding partitions are in One{G)', i-C., {{a, 6, c, d}} G One{Q) because of 
sub-graph 1, {{a}, {6}, {c}, {d}} because of 2, and {{a, 6}, {c, d}} because of 5. 
Note that the only sub-graph of this graph that is not one-orientable is the graph 
itself. 




Fig. 2. The small graphs are one-orientable sub-graphs of the graph on the left. The 
direction on the edges was added for demonstrating the fact that the sub-graphs are 
one-orientable. 



As a result, if we are given an E-formula (p together with an adequate E-graph 
G for it, we can check if p is satisfiable by checking if there is some partition 
a G One{G) such that p(a) = 1. This is implemented as follows: 

Represent p as a, Boolean formula with atoms (u = v). This can be done by 
flattening all the ITE terms in the formula. This flattening procedure increases 
the size of the formula only polynomially. Now construct the following formula 
G: 

1. For each v G V(G), G has an input variable ranging over {u \ (u,v) G 

man u {*}. 

2. For each u,v € V{G), G contains an internal variable e(„ „) := (/„ = u) V 
{Iv = u). 
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3. C contains a circuit for calculating the transitive closure of a graph with 

vertices V(G)- for every u,v € V{G), it has input variable and out- 
put variable There are known constructions of this circuit that are of 

polynomial size (e.g., using successive log(R(t/)) boolean matrix multiplica- 
tions). 

4. Replace every atom {u = v) va Lp by 
Proposition 2. C is satisfiable iff is satisfiable. 

The general idea of the construction (see Appendix A. 2 for the proof) is that 
the variables represent a one-orientable sub-graph of that results from 
undirecting all the edges (u, /„). Then tu,v represents the partition resulting from 
this sub-graph, that is used as input to (p. 

The size of the resulting state space is rit; i-^«i ~ ritiev {degree{v) + 1), where 
degree{v) is the degree of v in G=- 

7 Lower Bound on The Size of Partition Sets 

In Section 6, we constructed an adequate partition set (the one-orientable parti- 
tion set) for a given E-graph G- This set was of size at most + 

where degreeiy) is the degree of v in G=- In this section we show every partition 
set that is adequate for G is at least of size: 

J]l(\^^9ree'{v) + 1 ) 

V vev 

where degree' {v) is the degree of v in {V{G),EQ{G) H DQ{G))- Therefore on 
any E-graph where EQ{G) Q DQ{G), One{G) is close to optimal in terms of 
the number of Boolean variables that is needed to represent such a state space. 
We get that any adequate assignment set will need at least ^ X)(log('^e( 7 ree(v) -I- 
2) — 1) Boolean variables, and that for representing One{G) we need only slightly 
more than twice this number: '^log{degree{v) + 1). 

It is important to notice that a lower bound on the size of an adequate 
partition set for the E-graph G is in fact a lower bound on the size of the state 
space generated by any method that uses only the information in G- This means 
that this lower bound applies to any method that examines only the set of atomic 
equalities (and their polarity) that appear in the E-formula. To break this lower 
bound barrier, a more careful analysis of the formula will be needed. 

We start with the following definition: 

Definition 11. (maximal satisfiable sub-graph): An E-graph T~L is a maximal 
satisfiable sub-graph of G (denoted % G), if it is a satisfiable sub-graph of G , 

and there is no Hi that is a satisfiable sub-graph of G such that H is a proper 
subgraph ofH\. 

Lemma 2. If Hi « G, H 2 « G and Hi yf H 2 then there is no partition a such 
that a ^ Hi and a ^ H 2 - 
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Proof. Assume to the contrary: a \= 'Hi and a \= H 2 - Define E-graph H = 
{V{g),EQ{Hi) U EQ{H 2 ),DQ{Hi) U DQ{H 2 )). Clearly, a'^H. Also, H<G, 
and thereby H « Q. Since H\^H 2 <H, H = H\ = H 2 - □ 

Lemma 2 directly implies: 

Corollary 1. If partition set R is adequate for the E-graph Q, then |i?| > 

m \ H«g}\. 

Consider an E-graph g where EQ{g) C DQ{g). For this type of E-graphs we 
can bound their set of maximal satisfiable sub-graphs. We say a partition a is 
connected in a graph G if every set ai G a is connected in G restricted to the 
vertices of Oj. We denote by CP{G) the set of connected partitions of G. 

Example 5. In the graph of Figure 2, {{a, 6}, {c}, {d}} and {{a, c, d}, {&}} are 
connected partitions, yet {{c, 6}, {a, d}} is not. 

Proposition 3. If EQ{g) C DQ{g), then |C'P(t/=)| < \{H \H <ng}\ 

Proof. We define the following mapping: 

V’:{P I n«g}^GP{g^) 

Where f^{H) = (see Definition 8). 

Clearly, for every H, if{H) G GP(g^) since for every ai G a-u^, on is a 
connected component of H^ < g=, and so is connected in g^. 

To prove the proposition we show that ip is onto. For a G CP{g=), define 
H < g to be: 

1. If u V and (u,v) G EQ(g) then {u,v) G EQ{H). 

2. If u /q V and \u,v) G DQ{g) then \u,v) G DQ(H). 

We claim that H is maximal and that 'tp{H) = a (clearly H is satisfiable). To 
show that 'ip{H) = a, we need to show that the connected components of H= 
coincide with the sets of a: 

1. Each connected component of is a subset of some ai G a, since every 
edge {u, v) G H= satisfies u v. 

2. Each ai is a subset of some connected component of H=, since all edges of 
g^ between vertices of ai are in H=, and a^ is connected in g~. 

We now show that H is maximal. We have to handle two cases: 

1. (u,v) G EQ(g), (u,v) ^ EQ{H). This means that u qfa v, and therefore, 
since EQ{g) C DQ{g), then (u,v) G DQ{H), but then if we add (u,v) to 
EQ{H) it makes H unsatisfiable. 

2. (u,v) G DQ{g), (u,v) ^ DQ{H). This means that u v, implying there 
exists ai G a such that u,v G ai. ai is connected in t/=, and therefore in 
H=. Now, using Lemma 1, we see that adding {u,v) to DQ{H) will make H 
unsatisfiable. 

□ 
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Corollary 2. For an E-graph Q such that EQ{Q) C DQ{Q), every partition set 
R that is adequate for Q satisfies |i?| > \CP{G=)\- 

Theorem 2. Eor every graph G, |CP(G)| > 

We believe this theorem (and its proof) to be of independent interest, as it 
addresses a natural (yet nontrivial) combinatorial counting question: the number 
of connected partitions of a given graph. 

In fact, using a construction similar to that of one-orient able sets, it is easy 
to show that for every graph G, |GP(G)| < + !)■ Together 

with Theorem 2 we have a good approximation of |GP(G)|. 

Combining Theorem 2 with Corollary 2 we get: 

Corollary 3. For an E-graph G such that EQ{G) Q DQ{G), every partition set 
R that is adequate for G satisfies 



1^1^, n {-degree{v) 

\ vev(G) 



1 ) 



Where degree{v) is the degree of v in G=- 

Claim. For any two E-graphs Gi <G 2 , if partition set R is adequate for G 2 then 
it is also adequate for Gi- 

Using this claim with Corollary 3: 

Corollary 4. For an E-graph G, every partition set R that is adequate for G 
satisfies 

|-R| > n {^degree' {v)-\-l) 

\ vev{G) ^ 

Where degree' iy) is the degree of v in (V{G),EQ{G) (1 DQ{G)) ■ 



8 Proof of Theorem 2 

In order to prove Theorem 2 we need the following lemmas: 

Lemma 3. For a graph G and two non-intersecting sets S,T CV (G) such that 
SUT = V{G) 

|GP(G)| > Yiidegreeriv) -\- 1) 

ves 

Where degreeriv) = \{u € T \ (u,v) € E}\. 

Proof. Consider the following procedure that constructs a partition a: 

1. First start with a = {{w} \ v GT} (Note that a is not a partition yet). 

2. For all vertices v G S do one of the two: 
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(a) Take one vertex u G T such that {u, v) G i?, and add v to u's set in a. 

(b) Add the set {u} to a. 

Clearly, the final a is a partition, and is connected, since we add a vertex w to a 
set only if there is a vertex u in that set such that (u, v) G E. 

For each vertex w G S', we have to choose between degreeT(v) + 1 choices 
in the procedure. So, if we show that different choices for the vertices always 
lead to different partitions, then we constructed a set of Y\^^s{degreeT{v) + 1) 
different connected partitions of G. 

First note that the partition a constructed has the following property: Every 
ai G a satisfies one of: 

1 . Oi n T = 0 and | a j | = 1 

2. |a,nT| = 1. 

From this we see that two different runs lead to different partitions. If a vertex v 
chooses to join some u G T, it will not be in the same set with any other u' G T. 

50 different choices here lead to different partitions. Also, if a vertex chooses not 

to join any u G T, then it will be a singular set in a, and this cannot happen if 
it chooses to join some vertex of T. □ 

Lemma 4. For a graph G, there are non-intersecting sets Sq, Si, such that SqU 

51 = V{G), and for every v G Si, degrees^^_,^{v) > \degree{v) . 

Proof. Start with two arbitrary sets So and S'! . While possible, pick some v G Si 
such that degrees^^_i^{v) < \degree{v). Take v from Si and put it in S\-i. If 
there are no such vertices left, then So and S\ satisfy the lemma. 

We claim that the procedure will end. This is because each such move in- 
creases the following function: 

Cut{So,Si) = |{(u,^;) gE(G) \uGSo,vGSi}\ 

This is because: 

Cut{So U {«}, Si \ {w}) = Cut{So, Si) + degrees^ (w) - degrees^ (w) 

By the way we pick our vertices, we see that this function increases for each 
move we make. Since Gut{So,Si) < |E(G)|, the procedure will halt after at 
most \E{G)\ moves. □ 

We comment that the proof of Lemma 4 is in fact a schoolbook construction of 
a maximum cut in a graph. 

Proof, (of Theorem 2): Using Lemma 4 we get two sets non-intersecting sets So 
and ^i. Now we use Lemma 3, setting S = So, and T = Si. We get: 

\cp{G)\ > n (degreesiiv) + 1) > i\^^9ree{v) -k 1) 

V^Sq V^Sq 
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We use Lemma 3 again, setting S = Si and T = Sq. We get: 
\cp{G)\ > n {^degree{v) + 1) 



vGSi 



Combining: 



|C'P(G)P > {^degree{v) + 1) • H + 1) = H (^^egree(t;) + 1) 

v^Sq v^Si vGV 

Proving the theorem. □ 



9 Static vs. Dynamic Ranges 

It is difficult to estimate the advantages of using dynamic ranges. We cannot show 
any relation to the minimal partition set possible (as we did for one-orientable 
ranges), but since dynamic ranges generalize static ranges, they are at least as 
good. There are E-graphs where dynamic ranges seem to not give any advantage 
over static ones, but we will show an example where using a dynamic range gives 
far better results. 

Consider the following E-graph Q: 

1. V = {vi I i G {!,... ,n}}U | z G {1, . . . , n}, j G {1, . . . , d} | 

2. DQ = {{vi,Vj) I z, j G {1, . . . ,n}} 

3. EQ = {{vi,Vj) I i,j G {!,..., n}}u|(zz:(, Vi) | i G {l,...,n},j G {!,... ,d}| 

In other words, G is constructed of a clique of equality and disequality edges: 
vi, . . . , Vn, where each Vi has d equality leaves connected to it uj, . . . , ztf. 
Assume D is some static range of minimal size that is adequate for G- 

Claim, for all z and j, D{vi) C D{ul) 

Proof. Assume there is some I G D{vi)\D{ul) . Define range D' to be the same as 
D, except D'{vi) = D{vi)\{l}. We claim that D' is adequate for G contradicting 
the fact that D is minimal. 

For a satisfiable H < G, there exists an assignment a £ D s.t. a \= H. We 
find a' G D' s.t. a' ^ %. We split to two cases: 

1. If {vi,uD G EQ{'H) then a{vi) = a{uj), and therefore, since a(zz^) yf I, 
a{vi) yf I, and then we define a' = a, since a G D' . 

2. If ^ EQ{'H) then adding this edge to % will leave it satisfiable (this 

is because of G’s structure). Therefore, there is some a £ D that satisfies 
this new TL. This a will also satisfy the original TL. Using the same argument 
as in (1) we see that a G D'. 

□ 
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We therefore have that for all i and j, \D{ul)\ > \D{vi)\, and therefore: 

ni^K')i> (ni^(«*)i 

'i'J \ i 

Considering the restriction of Q to {t^i, . . . , Vn}j and using corollary 4, we have: 

i 

Therefore: 

\D{v)\ = n \D{v,)\ ■ n i^K)i > ( n 

veV{G) i i,j V i 

If we take log 2 of this number (to get the number of Boolean variables 
needed), we have ^ • (d+ 1) -n- (log(n) — 1). In comparison, the following dynamic 
range is adequate for Q: 

1. Vi = 

2. ul = {zzi} 

Which is of size n!, i.e., 0(nlog(n)) Boolean variables, which does not depend 
on d at all. 

Note that the lower bound on the size of any static range is true for any 
E-graph containing this E-graph as a sub-graph. 

10 Experimental Results 

To test our dynamic range allocation procedure and compare our results to the 
static range allocation procedure of [PRSS98], we generated many random E- 
graphs. For each E-graph Q we calculated the size of the resulting state space 
generated [S’!, and then calculated |5'|TW to give the average size of the range 
for each variable. 

We believe dynamic range allocation performs especially well on E-graphs 
that can be divided into components with a small number of equality edges 
between them. We performed the following set of tests: 

Set V{g) = {!,..., 100}. For each p G {.2, .25, . . . , 1} and g G {.01, .02, . . . , .2} 
we generate 10 random E-graphs, by letting each edge (i,j) be chosen with 
probability p if z mod 4 = j mod 4, and with probability q otherwise. This 
way we get 4 components with edge probability p for edges internal to these 
components, and probability q for edges between the components. We also ran a 
simpler set of tests, where for each p G (.02, .04, . . . , 1} we generated 10 random 
E-graphs on 100 vertices, where each edge is taken with probability p. 

In Figure 3 we see the summary of the results, where for each test we averaged 
the results (|5'|ioo) of dynamic and static ranges over the 10 graphs, and give 
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Fig. 3. The ratio between the average domain sizes (of each variable) allocated by the 
static and dynamic range allocation methods, as computed on graphs with 100 vertices 
and 4 components on the left graph, where p and q are the probabilities of adding edges 
in and between the components, respectively. The right graph summarizes the results 
for graphs with 100 vertices and edge probability p. 



the ratio between them. We can see that for all cases the ratio is greater than 
1, meaning the dynamic range allocation is at least as good (on the 10 graph 
average), and that on sparse graphs (either q is small or p is small) we get an 
improvement of approximately 2, which means a decrease of in the state 
space size. We have also implemented dynamic range allocation as a part of a 
procedure for checking uninterpreted functions [RSOl], and achieved a factor 2 
improvement in the range size, similar to the results on random graphs. However, 
we have not found a case where this improvement led to a significant change in 
running times. This is especially because our examples are of two types: the run 
either completes in less than 1 second, or it never completes. 
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A Proofs 

A.l Dynamic Range Allocation 

Proof, (of Theorem 1): For a satisfiable % < t/, we find x G D' such that 
X \= %. We first notice that since % is satisfiable, "H[u] is also satisfiable. Also, 
%[ f] < Therefore there is some x G D such that x \= We extend x 
to V, and therefore only need to show that x satisfies all edges of TL involving v 
since all other edges are clearly satisfied by x. 

1. If there is no {u,v) G EQ{'H), then set x{v) = unique. Since all edges 
involving v are disequality edges, we have that x \=Tl. 

2. If there is some (u,v) G EQ{'H), then set x{v) = u: 

(a) For a vertex w such that {v,w) G EQ{'H), if w = u then clearly, 
x{w) = x{v). Otherwise, there is an equality edges (u,w) G EQ{'H[v]), 
and therefore x{w) = x{u), and then x{w) = x{v). 

(b) For a vertex w such that (v,w) G DQ{'H), since % is satisfiable, w ^ u. 
Also, there is a disequality edge {w,u) G DQ{'H[v\), and then x{v) = 
x{u) yf x{w). 

□ 



A. 2 One-Orientable Assignment Sets 

To prove proposition 1, we need the following definitions: 

Definition 12. (forest): A forest is an acyclic undirected graph. 

Definition 13. (spanning forest): A spanning forest for a graph G, is a subgraph 
E of G, such that F is a forest, and the connected components of F and G are 
the same. 

Claim. Every graph has a spanning forest. 

Definition 14. (forest partition set): The forest partition set of the E-graph Q 
is: 

F{Q) = {ap \ F is a forest and F < Q=} 

Proposition 4. F{Q) is adequate for Q. 

Proof. Given a satisfiable TL < G, take the forest F to be a spanning forest of 
the graph Ti^. Clearly, E{F) C EQ{TL) C EQ{G). So, ap G R. 

We claim that ap \=TL: 

1. If (u,v) G EQ(Tl), then u and v are in the same connected component of 
TL=, and therefore of F. This means that u 



V. 
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2. If (u,v) S DQ{'H) then u and v are not in the same connected component 
of by Lemma 1. This means that they are in different components of F, 
and therefore u v. 

□ 

Proof, (of Proposition 1): Since every forest is one-orientable (by rooting each 
of the trees in the forest, and directing all edges towards the root), we get that 
F{Q) C One{Q) and therefore One{Q) is adequate for Q. □ 

Proof, (of Proposition 2): We first show that the graphs represented by the 
variables e^u.v) are all the one-orientable sub-graphs of possible. 

Take some one-orientable H < Q^. Denote by D the directed graph resulting 
from directing FI's edges in such a way that every vertex has out-degree at most 
1 in D. We define assignment a to the input variables of C. For each v: 

— If there is exactly one u such that (v, u) £ D, set a(/„) = u. 

— Otherwise, there are no outgoing edges from v in D. Set a(/^) = *. 

We get that a(e(„_^)) = 1 iff (u,v) £ H. Therefore, a{t(u,v)) = 1 iff m and v are 
connected in H. In other words aft^^.v)) = 1 iff u 

So, for every a G One{Q), there is some assignment a to C, such that a{C) = 
p{a), and since One{Q) is adequate for ip, we conclude. □ 
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Abstract. Given a binary relation IE VJ R on the set of ground terms 
over some signature, we define an abstract rewrite closure for E VJ R. 
An abstract rewrite closure can be interpreted as a specialized ground 
tree transducer (pair of bottom-up tree automata) and can be used to 
efficiently decide the reachability relation ~^*]eue-uir' constructed 
using a completion like procedure. Correctness is established using proof 
ordering techniques. The procedure is extended, in a modular way, to deal 
with signatures containing cancellative associative commutative function 
symbols. 



1 Introduction 

Completion techniques for term rewriting systems, which are typically used for 
reasoning about congruence relations, have been extended in recent years to 
deal with non-symmetric relations. The general theory was outlined in [11] and 
sound and refutationally complete inference systems were obtained for dealing 
with partial congruence and partial equivalence relations [4]. Usually one ob- 
tains suitably restricted (via ordering restrictions) chaining calculi. The gain in 
efficiency with an ordered system over the unordered variants of chaining are 
comparable to the improvements achieved by superposition over unrestricted 
paramodulation . 

This paper presents a completion based approach to decide the rewrite re- 
lation induced by a set of directed (i.e., non-symmetric) and undirected (i.e., 
symmetric) ground equations. The basic technique involves combining standard 
completion (for undirected equations) with non-symmetric completion (for di- 
rected equations). Standard completion is reflected in a superposition inference 
rule that deduces critical pairs between undirected equations. Non-symmetric 
completion yields a chaining inference rule to deduce critical pairs between di- 
rected equations. Finally, the interaction between the two kinds of equations is 
captured using a paramodulation inference rule. We first consider the problem of 

* This research was supported in part by the National Science Foundation under grants 
CCR-9902031 and CCR-0082560, and NASA Langley Research Center under con- 
tract NASl-00079. 
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constructing a “convergent system” , called a rewrite closure, for a set of ground 
(directed and undirected) equations. Subsequently, we extend the method to al- 
low for cancellative associative commutative function symbols in the signature. If 
all input equations are undirected, then the problem reduces to the construction 
of congruence closure and hence, an abstract rewrite closure is a generalization 
of an abstract congruence closure [6]. 

The reachability or rewrite relation induced by a ground term rewriting sys- 
tem was shown to be decidable in [9] and [13] using, respectively, tree automata 
techniques and explicit transitive closure computation. An abstract rewrite clo- 
sure can be interpreted as a specialized “ground tree transducer” (GTT). In this 
paper, we give a set of abstract completion-like inference rules for construction of 
rewrite closures. These rules yield efficient algorithms under suitable strategies. 
Moreover, our method is extendible to richer signatures. 

Correctness of the inference system is established using proof ordering tech- 
niques. Each proof is assigned a measure and all inference rules transform a proof 
with a larger measure into a proof with a smaller measure. The desired form of 
proof, for example a rewrite proof or a valley proof, is assigned a minimal mea- 
sure. Correctness arguments based on proof orderings also show compatibility 
of the inference systems with certain kinds of simplifications. 

Apart from our interest in extending rewriting techniques to non-symmetric 
relations, this work is also motivated by our interest in developing abstract 
transformation rules for constraint solving. Typical constraints consist of equa- 
tional constraints, which are solved by a unification procedure, and ordering 
constraints, where the ordering is usually some kind of a path ordering. Almost 
all such orderings are rewrite relations that also satisfy certain additional prop- 
erties, and hence an efficient procedure for deciding rewrite relations is a crucial 
first step [13]. Note that the cancellative axiom for AC symbols is satisfied by 
any AC compatible total simplification ordering. 

Preliminaries 

Let A be a set, called a signature, with an associated arity function a : S —>■2'^ 
and let V be a disjoint (denumerable) set. We define T(A, V) as the smallest 
set containing V and such that /(ti,...,t„) G T(A, V) whenever / G A, n G 
a(/) and G T(A, V). The elements of the sets A, V and T(A,V) 

are respectively called function symbols, variables and terms (over A and V). 
Elements c in A for which a(c) = {0} are called constants. By 'T(S) we denote 
the set T(A, 0) of all variable-free, or ground terms. The symbols s,t,u,... are 
used to denote terms; f,g,.. ., function symbols; and x, y,z, . . ., variables. 

An (undirected) equation is an unordered pair of terms, written s « t. A 
directed equation or rule is an ordered pair of terms, written s — >■ t. If £1 is a set 
of rules, then we define £~ = {s ^ t : t ^ s & £} and £^ = £ VJ £~ . A set f 
of rules is called a rewrite system and the rewrite relation ~^£ induced by £ is 
defined by: u v if, and only if, u = u[la], v = u[ru] is obtained by replacing 
la by ra in u, I ^ r is in £, and a is some substitution. If — >■ is a binary relation, 
then denotes its inverse, gg its symmetric closure, its transitive closure 
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and — >■* its reflexive-transitive closure. A set of rules £ is terminating if there 
exists no infinite reduction sequence sq ~^s si ~^S S2 • • • of terms. 

We will mostly be interested in ground rewrite systems, denoted by non- 
calligraphic symbols IE, M. In Section 2, the arity a{f) of a symbol f £ S is 
assumed to a singleton and we focus on (the transitive closure of) the rewrite 
relation — induced by the ground rewrite system ]E U M over such a 
signature E. In Section 3, we shall assume that Sac C A is a set of AC symbols. 
Such symbols are varyadic, with arity a{f) = {2,3,4,...} for / G Eac- If 
/ G Eac^ then the extension of a rule /(si,S2) t, call it p, is defined as 
f{f{si,S2),x) — >■ f{t,x) and is denoted by p®. Given a rewrite system TZ, by 
7^® we denote the set TZ plus extensions of rules in TZ. By AC\ 1 Z we denote the 
rewrite system consisting of all rules u ^ v such that u ££ac ^nd v = v'a, 
for some rule u' -£■ v' in TZ and some substitution a. 

A proof of s — >■ t (in £) is a finite sequence s = sq ~^S sij si ~^£ 
S2, ■ ■ ■ , Sk-1 -££ Sk = t {k > 0), which is usually written in abbreviated form as 
s = So ~^£ Si — >■£ • • • — >■£ Sk = t {k > 0). 



2 Abstract Rewrite Closure 

We closely follow the idea of an abstract congruence closure [6] in defining the 
notion of an abstract rewrite closure. More specifically, we flatten out terms via 
introduction of new constants and corresponding definitions. 

Definition 1. Let E he a signature and K he a set of constants disjoint from E. 
A D-rule (with respect to E and K) is a rewrite rule of the form /(ci, . . . , Ck) — >■ 
c where f £ E is a k-ary function symbol and Ci, . . . ,Cfc,c are constants in set 
K. A rewrite rule of the form c — >■ /(ci, . . . , Ck) will be called a reverse D-rule. 

A C-rule { with respect to K) is a rule c ^ d, where c and d are constants 
in K. 

A set of D-rules and C-rules (with respect to E and K) is a specifi- 
cation of a bottom-up tree automaton transitions [8]. The set K represents 
“states” in the tree automaton. Thus, D-rules and C-rules represent regu- 
lar and e-transitions respectively. A set of ground equations and rules, say 
Jo = IEq U J?0) where J?o = l/(ff(a) 5(0, 6)) « a} and J?o = (a — >■ b}, 

can be represented as Ji = 1/(03,03) « oi, oi — >■ 02} by introducing the set 
i?i = (a — >■ oi, & — >■ 02, 5(01,02) — >■ 03} of D-rules. 

A constant o in AT is said to represent a term t in 'T(E) via the rewrite system 
E ii t c. For example, the constant 03 represents the term g(a, b) via E\. 

Definition 2 (Abstract rewrite closure). Let E he a signature and K he a 

set of constants disjoint from E. A ground rewrite system E U F U B is said to 
be an (abstract) rewrite closure (with respect to E and K) if 

(i) E and F are both sets of D -rules and C-rules, B is a set of reverse D- 
rules and C-rules such that each constant c £ K represents some term t £ T{E) 
via E, and 
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(a) the rewrite systems EUF and E\JB are terminating; and for all terms 
s,t G T{S), if s ~^*E±yjpyjB t then s -^EUF ° '^EuB- t- 
Moreover, if I = IE UM is a set of ground equations and rules overT{E) such 
that 

(Hi) for all terms s and t in T{E), s ~^*]e±uir ^ ^ 

then EUFUB will be called an (abstract) rewrite closure for (the rewrite relation 
induced by) I. 

From the set EUFUB, one can obtain a pair of (bottom-up) tree automata [8]: 
the set EUF defines the transitions of the first automaton and the set E U B~ 
defines the transitions of the second automaton (over the same set K of “states”). 
Such a pair defines a binary relation ° ground terms 

and is called a “ground tree transducer” in the tree automata literature [8]. 

Using a combination of standard completion and non-symmetric completion, 
which we present next, we can obtain a rewrite closure E2 U F2 U B2 for the 
set lo = EqU Rq, where E2 = {a ^ ci, b -)> C2, g{ci,C2) -f C3, f (03,03) -)> 
Cl}, F2 = {ci — C2}, and B2 = {03 — >■ g{o2, C2)}. A rewrite closure for EU R 
gives a decision procedure for (deciding) the rewrite relation — 



Construction of Rewrite Closure 



We next present an inference system to construct a rewrite closure for a finite set 
II of ground equations and rules over the signature E. Our description is fairly 
abstract, in terms of transition rules that operate on tuples (E,E,R), where 
E = U fR is a set of ground equations and rules (over E), and E and 
are sets of (reverse) D-rules and C-rules. Tuples represent possible states in the 
process of constructing a rewrite closure. The initial state is (Jq, 0, 0), where I3 
is the input set of ground equations and rules (over T{E)). 

The transition rules can be derived from those for standard completion and 
non-symmetric completion as described in [3] and [11], with some differences so 
that a system is constructed over an extended signature. We assume that the new 
constants are chosen from an infinite set U disjoint from E, which is endowed 
with an ordering^ )^c/. 

Equations and rules are flattened using extension and simplification. 



Extension: 



m,E,R) 

{I[o],E U {s — >■ c|, R) 



if s — >■ c is a H-rule and c G U is a new constant^. 



Simplification!: 



{E[s], EU {s ^ c|, R) 
{E[c],E U {s — >■ c}, R) 



^ The set R will later be partitioned into the set F of forward rnles and the set B of 
backward rules. 

^ By an ordering we mean any irreflexive and transitive relation on terms. 

® The notation J7[s] denotes that s occurs as a subterm in some equation or rule in E 
and I[c] denotes the new set obtained by replacing that occurrence of s in if by c. 
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Once an equation or rule in E is of the form of a D-rule, reverse -D-rule, or 
a C-rule, it can be oriented. 



Orientation: 



( J U {s « c}, E, R) 
{E, if U {s — >■ c}, R) 



{E U {u ^ f }, E, R) 
{E, E,R\J {u ^ w}) 



if s — >■ c is either a D-rule or a C-rule with s )^ij c and m — >■ w is either a C-rule, 
D-rule, or a reverse D-rule. 

Trivial equations and rules are deleted. 



Deletion: 



( J U {s « s}, D, R) (J U {s — >■ s}, E, R) {E, D, i? U {s — >■ s}) 

(E,E,R) {H,E,R) {H,E,R) 



Deduction in standard completion, as well as in non-symmetric completion, 
is based on computation of critical pairs. There are three kinds of critical pair 
computations — (i) between two rules in E, which are handled by superposition; 
(ii) between a rule in E and a rule in R, which are handled by paramodulation; 
and (iii) between two rules in R, which are handled by chaining. 



Superposition: 



( J, D U {t — >■ c, s[t] — >■ d}, R) ( J, D U {t — >■ c, t — >■ d}, R) 

{E, D U {t — >■ c, s[c] — >■ d}, R) {E, D U {t — >■ c, d — >■ c}, R) 



if s[t] yf t in the first case and d^u c in the second case. 

The set R can be partitioned into the set F = {s ^ t G R : s ^ 
t is a D-rule or a C-rule with s t} of forward rules and the set D = {s — >• 

t G R : t ^ s is a, D-rule or a C-rule with t )^u s} of backward rules. 

Definition 3. Let E and R = F U B be sets of (reverse) D -rules and C -rules. 
The set CP{E, R) of critical pairs between rules in E and R is defined as: 

CP{E, R) = {/(. .., d, ...)—>■ c :/(..., d', c G D and d ^ d' G B} 

U {c d, d', c G D and d' ^ d G F}. 

The set CP{R) of critical pairs between rules in R is defined as: 

CP{R) = {t[d] -G c : d ^ s G B and t[s] — >■ c G F} 

U {c — >■ t[d] : c -G t[s] G B and s — >■ d G F}. 

Note that if the sets E and R contain only D-rules, reverse D-rules, and C-rules, 
then so do the sets CP{E,R) and CP{R). 



Chaining and Paramodulation: 



jI[,E,R) 

{E, E, RU {s ^ t}) 



if s ^ t G CP{R) U CP{E,R). 
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A crucial component of deductive inference systems is simplification. In the 
ground case, several deduction steps reduce to simplification. In particular, the 
rules in E can be used to simplify terms in R. 



S implification2 : 



( J, A U {s — >■ c}, R[s]) 
(J, if U {s — >■ c}, i?[c]) 



Composition: 



(J, E U {c ^ d, s ^ c} , R) 
{I, if U {c — >■ d, s — >■ d}, R) 



Example 1. Consider the set Jo = {f{g{a,b),g{a,b)) « a, a — >■ 6} of equations 
and rules. An abstract rewrite closure for Iq can be derived from (Jq, Eq, Rq) = 
(Jo, 0, 0) as follows (assuming U = {cq, Ci, C2, . . .} with Cj for i < j): 



i 


Input Ji 


Equations Ei 


Rules Ri 


Transition Rule 


0 


Jo 


0 


0 




1 


{fgabgab « a} 


{a — >■ Cl, 6 — >■ C2} 


{ci C2} 


Ext^ 0 Ori 


2 


{/C3C3 « a} 


El U {gciC2 C3} 


Ri 


Sim^ 0 Ext 0 Sim 


3 


0 


E 2 U {/C3C3 Cl} 


Ri 


Sim 0 Ori 


4 


0 


Eq 


Rl U {C3 -)> gC2C2} 


Par 



Since no further rules are added, the rewrite system J4 U ifj U i?4, where J4 = 
{ci — >■ C2} and i?4 = {c3 — >■ (/C2C2}, is an abstract rewrite closure for Jq. 



Correctness 

We use the symbol h to denote the one-step transition relation on states induced 
by the above transition rules. A derivation is a sequence of states (Jq, Eq, Rq) h 
(Ji,ifi,i?i) h •••. A derivation is said to be fair if any transition rule which 
is continuously enabled is eventually applied. The set if 00 of persisting rules is 
defined as rij>i Ej; and similarly, i?oo = Uj rij>i Rj. 

We shall prove that any fair derivation will only generate finitely many per- 
sisting rewrite rules in the second and third components. 

Theorem 1. Let Jq be a finite set of ground equations and rules. The set if 00 U 
Roo of persisting rules in any fair derivation starting from the state (Jo,0,0) is 
finite. 

Proof. Each inference step either reduces, or leaves unchanged, the number of E- 
symbols in the J-component. The inference rule which introduces new constants, 
extension, always reduces this number. Therefore, it follows that the number of 
new constants introduced in any derivation is finite. Let this number be n. 

If the maximum arity of any function symbol in E is c, then the number of 
distinct D-rules is bounded by \E\n'^'^^ and the number of distinct C-rules is . 
Consequently, the sets ifoo and Roo are finite. ■ 
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Theorem 2 (Soundness). If {JEq U Mq, Eq, Rq) {JEi U Mi, Ei, Ri) , then, 
the rewrite relation induced by lE^ U M\ U E^ U R\ is identical to the rewrite 
relation induced by lE^ U Mq U E^ U Rq over the set E{E U Kq) of terms, where 
Kq C U is the set of new constants introduced until state {Eq, Eq, Rq). 



Proof Ordering. The correctness of the procedure will be established using 
proof simplification techniques, as described by Bachmair [1] and Bachmair and 
Dershowitz [2], but specialized to our case of standard and non-symmetric ground 
completion. Let be any reduction ordering^ which contains >~ij and also ori- 
ents £)-rules from left to right. For instance, a recursive path ordering with an 
appropriate precedence on function symbols is such an ordering. 

Let s = C[u] — >■ C[v] = t he a, proof step using the equation or rule u « u G 
U Mu E^ U R. The complexity of this proof step is defined by 

({s, f}, _L, _L, _L) if u fv V € E^ ({s, t}, _L, _L, _L) if u^vGM 

({s}, t6, _L, t) if u^vGE {{s},u,T,f} if u^v€R,u>-v 

({t}, _L, s) if u ^ V € E~ ({t},t>,T,s) if u^v&R,v)^u 

where _L and T are new symbols assumed to be minimum and maximum respec- 

tively. Tuples are compared lexicographically using the multiset extension of the 
ordering on terms in the first component, and the ordering in the second 
and fourth component. The complexity of a proof is the multiset of complexities 
of its proof steps. The multiset extension of the ordering on tuples yields a proof 
ordering, denoted by The ordering is well-founded as it is a lexicographic 
combination of well-founded orderings. 

Lemma 1. Suppose {Eq, Eq, Rq) h {Ei, Ei, Ri). If tt is a ground proof, Sq —>■ 
Si ^ ^ Sk, in E^ U Mq U E^ U Rq, then there is a proof tt', sq = Sq — >■ s( -U 

•••—>■ sj = Sfc, in Ef U Ml U Ef U Ri, such that tt tt'. 

Proof. We need to check that each equation or rule in {Eq — Ei)^ U {Mq — 
Mi)^ U {Eq — El) U {Ro — Ri) has a simpler proof in E^ U Mi U E^ U Ri for 
each transition rule application. The details can be found in [16]. 

For instance, consider the case of simplification2 inference rule where s[u] -U 
t £ Ro is simplified to s[v] t £ Ri by the rule u ^ v £ Eq. The old proof 
•sN t is replaced by the new proof s[m] -£ei ~^Ri t- If sN t, then 
the new proof is smaller because the rewrite step s[m] -£ro t is more complex 
than (a) the proof step s[m] — >■ s['u] in either the second component, if s[m] yf u, 
or the third component; and (b) the proof step s[u] -£ t in the first component 
as s[u] s[u] and s[m] t. Next suppose that t >- s[m]. In this case, the old 
rewrite step s[u] — >-po t is more complex than (a) the proof step s[u] — >■ s['u] in 
the first component as t >- s[uj; and (b) the proof step s['u] — >■ t in the fourth 
component as s[w] s[w]. ■ 

Theorem 3 (Completeness). Let Eq be a finite set of equations and rules. 
If {Eoo, EoojEoo U Boo) is the persisting state of a fair derivation starting from 

* A reduction ordering is an ordering that is well-founded and closed under contexts. 
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( J 07 0): then, the rewrite system Eoo U Foo U -Boo is an abstract rewrite closure 

for Jo- 

Proof. (Sketch) Fairness implies that all superposition, paramodulation, and 
chaining inferences between rules in Eoo and Roo are contained in the set 
(Ui-Bi) U (Ui-Ri). Fairness also implies that Joo is empty. Since the proof 
ordering is well-founded, it follows from Lemma 1 that for every proof in 
Ef U -Ri U Ef U Ri, there exists a minimal proof in E^ U Roo- We argue by 
contradiction that peaks, which are proof patterns of the form s ^ u ^ t with 
u s and u t, can not occur in the minimal proof. This implies that for all 
terms s,t G r(B), if s t then s ° 

over, the rewrite systems Eoo U Foo and Eoo U B,^ are terminating as they are 
contained in )^. Finally, property (i) and (ii) of Definition 2 follow from correct- 
ness of congruence closure [ 6 ] and Lemma 1. This establishes that Eoo^Foo^Boo 
is a rewrite closure for J?o U Mq. ■ 

Related Work and Other Remarks 

Note that the relation ° ^*eub- decidable as the rewrite systems 

EUF and EUB~ are terminating [11]. Although the search for a proof of 
the above form involves guessing the correct rewrite rules to apply, we can still 
decide in polynomial time if s o •<— t, as (i) the non-deterministic 

choices can be eliminated by maintaining subsets of K, that is, doing subset 
determinization along the computation, and (ii) the common context (7[_] of 
terms s and t such that ® ~^EUF C[c \, . . . , Cfc] t can be determined by 

starting with the largest common context of s and t and moving (only polynomial 
number of times) to a smaller context if necessary. Furthermore, using the result 
that establishes a quadratic bound on the length of a derivation for construction 
of congruence closure [ 6 ], we can show that we can reach a state consisting of 
all persisting rules using derivations of length 0{n^ + n°“*'^), where n is the size 
of the input and c is the maximum arity^ of any symbol in E. Reachability for 
ground rewrite systems was shown to be decidable in polynomial time in [13]. 

The construction of abstract rewrite closure is similar to performing “iterative 
(or transitive) closure” on a ground tree transducer (representing the one-step 
rewriting relation). However, there are the following differences: (a) whereas a 
GTT is specified as a pair of bottom-up tree automata, an abstract rewrite closure 
has an additional component, E, which keeps track of the term representation® 
and the undirected equations, like s « t, in the input. Thus, the undirected 
equations are treated using congruence closure and not as two distinct rules, 
s — >■ t and t ^ s {as would be done in the GTT approach); (b) our deduction rules 
are local and have ordering constraints. The computation of an iterative closure 
for GTT is done using exhaustive closure under the following rule (described in 



® Without loss of generality, the maximum arity c can be treated as a constant. 

® The -D-rules in E (introduced by Extension) are interpreted as representing the term 
DAG [6]. 
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our framework as): “deduce c — >■ d if /(ci, . . . , Cfe) — >■ c G EUB~ , /(c'^, . . . , cj,) —>■ 
d G E U F, and for each i, Ci and c' represent some common term in T^(A')”^. 
In [ 13 ], ad possible transitivity inferences are explicitly done; (c) our procedure is 
based on standard completion techniques and redundant inferences are avoided; 
(d) the correctness argument is in terms of proof orderings; and (e) our procedure 
can be extended to AC symbols, whereas tree automata techniques have not 
been extended to such richer signatures. We explain the last three points further 
below. 

Correctness arguments based on proof orderings allow for clear identification 
of redundant inferences and compatible simplifications. To illustrate this point, 
consider an inference rule (J, E, i?U{s — >■ t, t — >■ s}) h (J, i?U{s — >■ t}, R), where 
s t. This inference rule® is clearly sound. The completeness of the inference 
system that includes this rule easily follows by observing that the deleted rules, 
s — >■ t and t — >■ s in the i?-component, have simpler proofs using the new rule in 
the if-component. The new proofs are simpler in the third component. 

3 Ground Cancellative AC Theories 

We next enrich the signature with additional AC symbols Sac- Apart from 
the associative and commutative axioms, the symbols / G Sac are assumed to 
satisfy the cancellative axioms (or inverse monotonicity axioms), 

/(xi,X 2 , . . . , Xjn) ~ f ^ ym) iff f i,X 2 ) ~ ) Vm), 

f{xi,X2, ■ - -,Xm) -G f{xi,y2, ■ - -,yni) iff f{x2 , . . . , XjYi ) f{y2,---,ym), 

and the identity axiom f(x, e/) « x, where e/ is the identity element for /. 

In the presence of AC-symbols, apart from Z?-rules and C-rules, we addi- 
tionally require A-rules of the form /(ci, C2, . . . , c^) — >■ /(di, c?2, . . . , dfc), where 
m, fc G cx{f). Unlike Z?-rules and C-rules, A-rules do not correspond to any 
standard notion of a transition in bottom-up tree automata. The definition of a 
rewrite closure can be extended by allowing for A-rules and replacing standard 
rewriting by rewriting modulo AC [ 16 ]. 

We first consider the simple case of cancellative abelian monoid. Let signature 
S = Sac = {■} Etnd let K = {e, Ci, C2, • • • , c^} be a finite number of constants 
where e is an identity element for •. We denote an application of • by juxtaposition 
and use exponentiation notation and write, for example, for the term ci • ci. 
Moreover, we denote by [s, t] the term that is the greatest common divisor of s 
and t. Thus, [cfc2C3, cic|c4] = C1C2. 

Let Rq = {si — >■ ti,S2 -G t2, ■ ■ ■ ,Sn ^ tn} be a set of directed rules over the 
signature S LI K, where each rule Si -G U is (when fully flattened and reduced 

^ A stronger requirement (assuming each constant represents some term in T{S)) is 
Ci c'i, where C represents all the C-rules in E U F U B . This inference rule 
is similar in spirit to the inference rule used in Nelson-Oppen congruence closure 
algorithm [ 6 ]. 

® Having the rule s — >■ t in A is advantageous as rules in E can be used for simplifi- 
cation (see the Simplification 2 and Composition inference rules). 
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using the identity axiom) either a Z?-rule, a reverse D-rule, a (7-rule, or an A-rule. 
We first show how to “complete” this set. We associate a measure with every 
rule. The measure will be a vector from the set iV”, where N = {0, 1,2,.. .} is 
the set of natural numbers. For the initial set i?o of rules, we assign measures 
as follows: the rule Si — >■ ti is assigned the measure Si = (0, . . . , 0, 1, 0, . . . , 0), 
where 1 is in exactly the t-th component. 

We maintain the invariant that [h, r^] = e for all rules U ^ ri £ R and hence, 
we assume that [si,ti] = e for every i = 1,2, ... ,n. Let be the lexicographic 
ordering, or the total degree lexicographic ordering [7] . 



ACC- Chaining: 



{I, E, RU {s ^ t,u ^ u}) 



{I, E, RU {s ^ t,u ^ V, 



SU 

[s,d] [t,n] 



[s,i?] [t,u] J ' 



if [t, m] yf e and either (a) t >- s and u v; or (b) s >- t, u v, and s — >■ t is not a 
C-rule; or (c) t s, v >- u, and m — >■ u is not a C-rule. The new rule is assigned 
the measure a + f3, where a is the measure associated with the rule s — >■ t and 
(3 is the measure associated with the rule u ^ v. 

In order to ensure termination, we need to identify and delete redundant 
rules. The measure vector helps in doing this. 



ACC-Collapse: 



( E, RU {s ^ t,u' ^ u'}) 
( J, if , i? U {s — >■ t, M — >■ u}) 



if u' SU, v' a < f3, where ot and f3 are the measures associated 

with the rules s — >■ t and u' — >■ v' respectively. The rule u ^ v is assigned the 
measure [3 — 0 .. 

Along with the Deletions rule, this forms a set of transformations that can 
be used to complete a given finite set of (reverse) D-, C-, and AC-rules over a 
signature E = Sac containing exactly one cancellative AC function symbol. 

Example 2. Consider the set Rq = {cf — >■ C 2 , c| — >■ ci} of directed rules. We can 
complete this set as follows (we show only the third component of the state here 
as the other components remain unchanged): 



Any rule subsequently deduced by chaining can be simplified by collapse and 
no additional rules are added to the set R 3 . Thus, the system R 3 is the desired 
completion. 





We say s — t if, and only if, there exists a term u such that s ■ u t ■ u. 

n [*i 

The reflexive-transitive closure of the relation — and is denoted by — . 

Theorem 4 (Soundness). Suppose s fa t G Ri, where (0,0, i?i) is a state in 
any derivation starting from state (0,0, i?o)- Then, s — 
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Theorem 5 (Completeness). Let Rq be a finite set of (reverse) D-, C-, and 
AC-rules over S \J K. The set Roo of persisting rules in any fair derivation 
starting from the state (0, 0, Rq) is finite. Furthermore, if s — 
there is a proof of the form s ~^*AC\F‘ ° ^*AC ° ^*AC\B-‘ 

We combine the inference rules for the individual cancellative AC symbols 
and the inference rules for uninterpreted ground terms to get a procedure for 
constructing a rewrite closure for a set of equations and rules over a signature 
containing cancellative AC function symbols [16]. There are a few technical 
difficulties here however. First, in the case of a monoid, the length of the measure 
vector assigned to a rule was determined by the number of rules in the initial 
i?-component, i?o- In the general case, these rules are created by orientation 
and moved from the J-component to the i?-component . Secondly, in the case of 
a monoid, all the C-rules in the i?-component had exactly one measure vector 
associated with them. In case of a signature with [T'acI AC symbols, each C- 
rule will have a measure vector associated with it for each / G FJac- Third, we 
need an TC-compatible ordering that orients the H-rules in the right way. For 
this purpose, we use the ordering defined in [15]. When comparing two terms 
from a monoid, it reduces to the total degree lexicographic ordering. Finally, we 
additionally need ACC-superposition and ACC-paramodulation rules, for details 
and correctness see [16]. 

Other Remarks. The equational theory induced by a set of ground equations 
over a signature containing (non-cancellative) AC-symbols can be conservatively 
represented by H-rules, C-rules, and A-rules [5]. But, if we are interested in the 
rewrite relation, then the problem becomes much harder, as classical petrinet 
reachability is equivalent to the decidability of the rewrite relation induced by a 
set of ground rules over an abelian semigroup. A derivation using the inference 
rules presented here does not converge in the case of abelian semigroups. For 
instance, consider the petrinet with two states ci and C 2 and two transitions 
cfc 2 -A cic| and cfc 2 — >■ cf c|. ACC-Chaining inferences (assuming a total degree 
lexicographic ordering with Ci >- C 2 ) yield infinitely many persisting rules cf C 2 -A 
C 1 C 2 , cfc 2 -A cic|, . . . , c"c 2 -A cic^”^. The reachability problem for petri nets 
was shown to be decidable in [14, 12]. 

The problem of deciding reachability in the case of a cancellative monoid is 
related to solving a system of linear diophantine equations by “duality” . Consider 
the system {Axi — X 2 — 2 x 3 = 0,3a;i — Ax 2 + 5 x 3 = 0}- This system can be 
transformed into the three rewrite rules cfc^ -A e, e —> C 1 C 2 , and C 2 -A cf. The 
original system has a non-trivial solution if and only if e e. The converse 
translation can be similarly done. This connection is not surprising since one 
motivation for considering the cancellative axiom for AC-symbols comes from 
AC-unification, where linear diophantine equations arise naturally. 
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4 Conclusion 

We have presented a set of inference rules, derived from standard completion 
and non-symmetric completion, to construct a rewrite closure for a set of ground 
equations and rules over a signature that can possibly contain cancellative AC 
symbols. The procedure works over an extended signature, incorporates essential 
simplifications, and is terminating. 

There are several directions in which we envisage future work. The inference 
rules can be extended by including rules for unification and for special kinds 
of rewrite relations, like the various path orderings. This would give abstract 
transformation rules for constraint solving. Another possible extension is to (ob- 
tain decision procedures for) ordered fields. In this context, the non-symmetric 
relation will be interpreted as the ordering relation > on the field elements. This 
work can also be extended along the lines of tree automata techniques and could 
be used to obtain efficient decision procedures for several properties of ground 
rewrite systems, for example confluence. 
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